Probability question

Say you have an s-sided dice with the numbers 1 to s written on its faces, and you roll it r times. What is the probability that each number will appear at least once?
I think I have an approximation that works OK for large values of s and r:

(1-((s-1)/s)[sup]r[/sup])[sup]s[/sup]

but I would like an exact formula. And no, this is not homework, those days are long gone.

It seems to me that, in the limit at r approaches infinity your probability ought to approach 1. But if I let r gop to infinity in your formula I seem to get zero. Something seems wrong.

At first glance it looks to me that the fromula should have a denominator of s[sup]r[/sup] and also any probailty for r<s should = 0 more complete answer to follow.

For r < s, the formula should be zero.

Quickly, and off the top of my head before I head home…

Successful sequences of throws will look like this:

1,2,3,… …,s-2,s-1,s,x,x,x,… …,x,x,x

Where we don’t care what order they come in and we are indifferent to what values x takes (there are r-s "x"s).

What is the number of ways to achieve this?

1…s in ANY order is s!

And these s elements are distributed in r slots, is that rCs? )(=r!/s!(r-s)!)

So the total number of successes is s!rCs

There are s[sup]r[/sup] different sequences of throws in total, giving:

P(hit all) = s!rCs/s[sup]r[/sup]

If that’s right, I’ll eat my hat (or buy myself a beer), but it should give you some pointers as to how you might calculate it…

No, it tends to 1 since (s-1)/s is less than 1. But anyway, that formula is only an approximation. Obviously, the probability should be 0 when r<s.

It’s wrong I’m afraid, but yes it’s the same as it’s the same as the answer I thought ([sup]r[/sup]P[sub]s[/sub]/s[sup]r[/sup]), but checking it out it doesn’t seem to work.

I know when s=r the probailty is s!/s[sup]r[/sup], but I’m having trouble from there.

(s-1)^r-1/s^r which becomes 1/s for r=1 and 0 for r = oo

I think the problem with The Great Unwashed’s approach is that the sequence 12234 (for example) would be counted twice (12234 and 12234).

Damn! Forgot the r term

For a single throw of the dice the P(x|1) = 1/s right?

Now for two throws there will be 2 way for a digit to appear 2 P(x|1)p(!x|1) which is 2 (1/s)(1-1/s) so lets extend that to r throws you’ll only ever get r instances where a number appears only once so P = r(1/s)(1-1/s)^(r-1) -> r(1/s)*((s-1)/s)^(r-1) which eventually gives you:

r*(s-1)^(r-1)/s^r

Now for r= 1 this goes to 1/s and for oo it goes to 0.

You’re saying that the more you roll the dice, the less likely it becomes that you have rolled every number?

Oh every number

I did it for a single occurrence. Sorry about the added noise. I did need the mental break though. Sorry about that. :smack:

Hmm… I see now that the original question might be ambiguous. Let me restate it: What is the probability that every number will appear at least once?

No your OP was fine, I’m having a bad enough week that this was a nice break.

I think it’s obvious that the denominator will be s[sup]r[/sup] (at least before cancellations, if any). Agreed? So the difficult part will be the numerator, which will be equal to the number of combinations with at least one of each. But let’s consider an different question: how many ways are there of not getting one of each? Well, WLOG we can say that there are no 1s. There are s-1 numbers left. So the number of combinations is (s-1)[sup]r[/sup]. Now how many combination don’t have 2? Well, at first we’d think it’s (s-1)[sup]r[/sup] again. But that would count every combination with neither 1 nor 2 twice. This is where it gets really complicated. We start with the number s(s-1)[sup]r[/sup]: this is the number of not-all-the-numbers-show-up combinations, counted for each number. Some of the combinations show up twice. Some show up three times, and four times, etc. So what I’m going to do is first take the 1st multiplicities (s(s-1)[sup]r[/sup]), subtract the second multiplicites, add back the third, etc. If I remember my combinatorics correctly, this should yield the correct number. Anyone who doesn’t understand what I’m doing, or thinks that it’s in error, is free to ask me to justify this step, but I think that should be saved for a different post.
1: m1=s(s-1)[sup]r[/sup] (already determined)
2: To be counted twice, it has to show up as having not included two different numbers. There are sC2 different choices for the two numbers. Given two numbers, the number of combinations is (s-2)[sup]r[/sup]. So m2=sC2(s-2)[sup]r[/sup].
3: Following the logic above, m3=sC3(s-3)[sup]r[/sup]
In general, m[sub]i[/sub]=sCi(s-i)[sup]r[/sup]

So the probability of every number showing up is the very messy formula below:
1-(sum(for i=1 to (s-1) of [ (-1)[sup]i+1/sup(s-i)[sup]r[/sup] ] ))/s[sup]r[/sup].
(For r>0)
Which actually is the same as:
sum(for i=0 to (s-1) of[ (-1)[sup]i/sup(s-i)[sup]r[/sup] ] )/s[sup]r[/sup].

I don’t see any way of simplifying further it at the moment. But let’s try it out on s=6, r=1.

(6-30+60-60+30-6)/6=0
Which is what we expect.

Using excel, I got the following numbers for s=6:
r=1 p=0
r=2 p=0
r=3 p=0
r=4 p=0
r=5 p=0
r=6 p=0.015432
r=7 p=0.054012
r=8 p=0.114026
r=15 p=.644213
r=20 p=.847988
r=30 p=.974802

This problems been annoying me, I’ve found these formulas which I should be able to put into a more general formula

when r = s, the probabilty*s[sup]r[/sup] is r! (the number of permutations when n=r)

when r = s + 1, the probailty*s[sup]r[/sup] is sr!/[(r-s)+1]! (the number of permutations for one set of identical items)

when r = s + 2, the probailty*s[sup]r[/sup] is sr!/[(r-s)+1]! + sr!/[(r-s)!][sup]r-s[/sup] (the number of permutations for one set of identical ite,ms plus the number of permutations for two sets of identical items)

I’ve used the formulas for the number of permutations for one and two sets of identical items (n!/p! and n!/p!q!) to get these to account for the fact.

There must be a simplier way!

I think this is going to be a pretty complicated problem. I’ll put down some ideas and see what comes up.

Let X[sub]i,j,k[/sub] be the number of different ways to get exactly i different numbers when you throw a j-sided die k times. In the OP’s case, we are looking for X[sub]s,s,r[/sub]: the number of ways to get exactly s different numbers when throwing an s-sided die r times.

Once you have that, then the probability is X[sub]s,s,r[/sub] / s[sup]r[/sup].

Assume nice values for i,j,k, so:
0 < i <= j (have to roll at least 1 distinct number, and not more distinct numbers than there are sides)
j <= k (always roll at least as many times as there are sides, just to keep things nice)

We’ll try an example. How many ways to get exactly 4 different numbers from throwing a 6-sided die 8 times? X[sub]4,6,8[/sub]:

There are [sub]6[/sub]C[sub]4[/sub] different sets of 4 different numbers we could get. For each of these, there are 4[sup]8[/sup] different sets of 8 rolls. But those 4[sup]8[/sup] sets of rolls will contain some sets where only 3, 2, or 1 different numbers came up, so we need to get rid of those. How many of our 4[sup]8[/sup] had exactly 3 different numbers? X[sub]3,4,8[/sub]. Similarly for 2 and 1.

So X[sub]4,6,8[/sub] = [sub]6[/sub]C[sub]4[/sub] ( 4[sup]8[/sup] - X[sub]3,4,8[/sub] - X[sub]2,4,8[/sub] - X[sub]1,4,8[/sub] )

And (I think) in general:
X[sub]i,j,k[/sub] = [sub]j[/sub]C[sub]i[/sub] ( i[sup]k[/sup] - SUM[sub]z=1…i-1[/sub]X[sub]z,j,k[/sub] )

And we need a terminating condition for the recursion, so X[sub]1,j,k[/sub] = j.

Ok, now a really simple calculation to try this. Flip a coin twice; i.e., roll a 2-sided die twice. The outcomes are HH,HT,TH,TT, 2 of which have exactly 2 different values, so X[sub]2,2,2[/sub] should be 2.

X[sub]2,2,2[/sub] = [sub]2[/sub]C[sub]2[/sub] ( 2[sup]2[/sup] - X[sub]1,2,2[/sub] )
X[sub]2,2,2[/sub] = 1 ( 4 - 2 )
X[sub]2,2,2[/sub] = 2

It looks like ryan’s right his values agree with my formulas which were worked out in approaching the problem from a different angle.

There was a typo in my formula (see if you can find where I fixed it :)). It should be:

X[sub]i,j,k[/sub] = [sub]j[/sub]C[sub]i[/sub] ( i[sup]k[/sup] - SUM[sub]z=1…i-1[/sub]X[sub]z,i,k[/sub] )

And my results are the same:
6,6,6, 0.01543209
6,6,7, 0.05401234
6,6,8, 0.11402606
6,6,15, 0.64421273
6,6,20, 0.84798754
6,6,30, 0.97480188

so it looks like the three of us all have correct methods. I don’t see any way to simplify mine.