Probability question

Ximenean · August 12, 2003, 3:14pm

Say you have an s-sided dice with the numbers 1 to s written on its faces, and you roll it r times. What is the probability that each number will appear at least once?
I think I have an approximation that works OK for large values of s and r:

(1-((s-1)/s)[sup]r[/sup])[sup]s[/sup]

but I would like an exact formula. And no, this is not homework, those days are long gone.

CalMeacham · August 12, 2003, 3:20pm

It seems to me that, in the limit at r approaches infinity your probability ought to approach 1. But if I let r gop to infinity in your formula I seem to get zero. Something seems wrong.

MC_Master_of_Ceremonies · August 12, 2003, 3:31pm

At first glance it looks to me that the fromula should have a denominator of s[sup]r[/sup] and also any probailty for r<s should = 0 more complete answer to follow.

ultrafilter · August 12, 2003, 4:21pm

For r < s, the formula should be zero.

The_Great_Unwashed · August 12, 2003, 4:56pm

Quickly, and off the top of my head before I head home…

Successful sequences of throws will look like this:

1,2,3,… …,s-2,s-1,s,x,x,x,… …,x,x,x

Where we don’t care what order they come in and we are indifferent to what values x takes (there are r-s "x"s).

What is the number of ways to achieve this?

1…s in ANY order is s!

And these s elements are distributed in r slots, is that rCs? )(=r!/s!(r-s)!)

So the total number of successes is s!rCs

There are s[sup]r[/sup] different sequences of throws in total, giving:

P(hit all) = s!rCs/s[sup]r[/sup]

If that’s right, I’ll eat my hat (or buy myself a beer), but it should give you some pointers as to how you might calculate it…

Ximenean · August 12, 2003, 4:56pm

No, it tends to 1 since (s-1)/s is less than 1. But anyway, that formula is only an approximation. Obviously, the probability should be 0 when r<s.

MC_Master_of_Ceremonies · August 12, 2003, 5:33pm

It’s wrong I’m afraid, but yes it’s the same as it’s the same as the answer I thought ([sup]r[/sup]P[sub]s[/sub]/s[sup]r[/sup]), but checking it out it doesn’t seem to work.

MC_Master_of_Ceremonies · August 12, 2003, 5:39pm

I know when s=r the probailty is s!/s[sup]r[/sup], but I’m having trouble from there.

Grey · August 12, 2003, 5:39pm

(s-1)^r-1/s^r which becomes 1/s for r=1 and 0 for r = oo

Ximenean · August 12, 2003, 5:46pm

I think the problem with The Great Unwashed’s approach is that the sequence 12234 (for example) would be counted twice (12234 and 12234).

Grey · August 12, 2003, 6:00pm

Damn! Forgot the r term

For a single throw of the dice the P(x|1) = 1/s right?

Now for two throws there will be 2 way for a digit to appear 2 P(x|1)p(!x|1) which is 2 (1/s)(1-1/s) so lets extend that to r throws you’ll only ever get r instances where a number appears only once so P = r(1/s)(1-1/s)^(r-1) -> r(1/s)*((s-1)/s)^(r-1) which eventually gives you:

r*(s-1)^(r-1)/s^r

Now for r= 1 this goes to 1/s and for oo it goes to 0.

Ximenean · August 12, 2003, 6:10pm

You’re saying that the more you roll the dice, the less likely it becomes that you have rolled every number?

Grey · August 12, 2003, 6:14pm

Oh every number…

I did it for a single occurrence. Sorry about the added noise. I did need the mental break though. Sorry about that. :smack:

Ximenean · August 12, 2003, 6:18pm

Hmm… I see now that the original question might be ambiguous. Let me restate it: What is the probability that every number will appear at least once?

Grey · August 12, 2003, 6:45pm

No your OP was fine, I’m having a bad enough week that this was a nice break.

The_Ryan · August 12, 2003, 8:38pm

I think it’s obvious that the denominator will be s[sup]r[/sup] (at least before cancellations, if any). Agreed? So the difficult part will be the numerator, which will be equal to the number of combinations with at least one of each. But let’s consider an different question: how many ways are there of not getting one of each? Well, WLOG we can say that there are no 1s. There are s-1 numbers left. So the number of combinations is (s-1)[sup]r[/sup]. Now how many combination don’t have 2? Well, at first we’d think it’s (s-1)[sup]r[/sup] again. But that would count every combination with neither 1 nor 2 twice. This is where it gets really complicated. We start with the number s(s-1)[sup]r[/sup]: this is the number of not-all-the-numbers-show-up combinations, counted for each number. Some of the combinations show up twice. Some show up three times, and four times, etc. So what I’m going to do is first take the 1st multiplicities (s(s-1)[sup]r[/sup]), subtract the second multiplicites, add back the third, etc. If I remember my combinatorics correctly, this should yield the correct number. Anyone who doesn’t understand what I’m doing, or thinks that it’s in error, is free to ask me to justify this step, but I think that should be saved for a different post.
1: m1=s(s-1)[sup]r[/sup] (already determined)
2: To be counted twice, it has to show up as having not included two different numbers. There are sC2 different choices for the two numbers. Given two numbers, the number of combinations is (s-2)[sup]r[/sup]. So m2=sC2(s-2)[sup]r[/sup].
3: Following the logic above, m3=sC3(s-3)[sup]r[/sup]
In general, m[sub]i[/sub]=sCi(s-i)[sup]r[/sup]

So the probability of every number showing up is the very messy formula below:
1-(sum(for i=1 to (s-1) of [ (-1)[sup]i+1/sup(s-i)[sup]r[/sup] ] ))/s[sup]r[/sup].
(For r>0)
Which actually is the same as:
sum(for i=0 to (s-1) of[ (-1)[sup]i/sup(s-i)[sup]r[/sup] ] )/s[sup]r[/sup].

I don’t see any way of simplifying further it at the moment. But let’s try it out on s=6, r=1.

(6-30+60-60+30-6)/6=0
Which is what we expect.

Using excel, I got the following numbers for s=6:
r=1 p=0
r=2 p=0
r=3 p=0
r=4 p=0
r=5 p=0
r=6 p=0.015432
r=7 p=0.054012
r=8 p=0.114026
r=15 p=.644213
r=20 p=.847988
r=30 p=.974802

MC_Master_of_Ceremonies · August 12, 2003, 8:40pm

This problems been annoying me, I’ve found these formulas which I should be able to put into a more general formula

when r = s, the probabilty*s[sup]r[/sup] is r! (the number of permutations when n=r)

when r = s + 1, the probailty*s[sup]r[/sup] is sr!/[(r-s)+1]! (the number of permutations for one set of identical items)

when r = s + 2, the probailty*s[sup]r[/sup] is sr!/[(r-s)+1]! + sr!/[(r-s)!][sup]r-s[/sup] (the number of permutations for one set of identical ite,ms plus the number of permutations for two sets of identical items)

I’ve used the formulas for the number of permutations for one and two sets of identical items (n!/p! and n!/p!q!) to get these to account for the fact.

There must be a simplier way!

borschevsky · August 12, 2003, 8:47pm

I think this is going to be a pretty complicated problem. I’ll put down some ideas and see what comes up.

Let X[sub]i,j,k[/sub] be the number of different ways to get exactly i different numbers when you throw a j-sided die k times. In the OP’s case, we are looking for X[sub]s,s,r[/sub]: the number of ways to get exactly s different numbers when throwing an s-sided die r times.

Once you have that, then the probability is X[sub]s,s,r[/sub] / s[sup]r[/sup].

Assume nice values for i,j,k, so:
0 < i <= j (have to roll at least 1 distinct number, and not more distinct numbers than there are sides)
j <= k (always roll at least as many times as there are sides, just to keep things nice)

We’ll try an example. How many ways to get exactly 4 different numbers from throwing a 6-sided die 8 times? X[sub]4,6,8[/sub]:

There are [sub]6[/sub]C[sub]4[/sub] different sets of 4 different numbers we could get. For each of these, there are 4[sup]8[/sup] different sets of 8 rolls. But those 4[sup]8[/sup] sets of rolls will contain some sets where only 3, 2, or 1 different numbers came up, so we need to get rid of those. How many of our 4[sup]8[/sup] had exactly 3 different numbers? X[sub]3,4,8[/sub]. Similarly for 2 and 1.

So X[sub]4,6,8[/sub] = [sub]6[/sub]C[sub]4[/sub] ( 4[sup]8[/sup] - X[sub]3,4,8[/sub] - X[sub]2,4,8[/sub] - X[sub]1,4,8[/sub] )

And (I think) in general:
X[sub]i,j,k[/sub] = [sub]j[/sub]C[sub]i[/sub] ( i[sup]k[/sup] - SUM[sub]z=1…i-1[/sub]X[sub]z,j,k[/sub] )

And we need a terminating condition for the recursion, so X[sub]1,j,k[/sub] = j.

Ok, now a really simple calculation to try this. Flip a coin twice; i.e., roll a 2-sided die twice. The outcomes are HH,HT,TH,TT, 2 of which have exactly 2 different values, so X[sub]2,2,2[/sub] should be 2.

X[sub]2,2,2[/sub] = [sub]2[/sub]C[sub]2[/sub] ( 2[sup]2[/sup] - X[sub]1,2,2[/sub] )
X[sub]2,2,2[/sub] = 1 ( 4 - 2 )
X[sub]2,2,2[/sub] = 2

MC_Master_of_Ceremonies · August 12, 2003, 8:49pm

It looks like ryan’s right his values agree with my formulas which were worked out in approaching the problem from a different angle.

borschevsky · August 12, 2003, 9:35pm

There was a typo in my formula (see if you can find where I fixed it :)). It should be:

X[sub]i,j,k[/sub] = [sub]j[/sub]C[sub]i[/sub] ( i[sup]k[/sup] - SUM[sub]z=1…i-1[/sub]X[sub]z,i,k[/sub] )

And my results are the same:
6,6,6, 0.01543209
6,6,7, 0.05401234
6,6,8, 0.11402606
6,6,15, 0.64421273
6,6,20, 0.84798754
6,6,30, 0.97480188

so it looks like the three of us all have correct methods. I don’t see any way to simplify mine.

Topic		Replies	Views
Yet another probability/combinatorics question involving dice... Factual Questions	12	2315	February 18, 2009
Quick stat prob Q Factual Questions	11	887	February 24, 2004
Dice Probability Factual Questions	12	988	September 3, 2002
Dice rolling probability question Factual Questions	11	1109	January 16, 2018
A (simple ?) probability question. Factual Questions	54	6715	April 25, 2013

Probability question

Related topics