Probability formula for number of rolls to get a particular result

Ponderoid · August 21, 2022, 5:59pm

If I have an N-sided die and I want to hit a particular number in that range 1-N, what is the formula for the average or expected number of times I’ll have to roll it before I get the one I want?

Thudlow_Boink · August 21, 2022, 6:07pm

Short answer: N.

The number of rolls it takes before you (first) get a particular value follows a geometric distribution.

The mean, or expected value, of such a distribution, is 1/p. In this case, assuming all sides of the die are equally likely to come up, p = 1/N, so the mean is 1/(1/N) = N.

(I assume that if I am mistaken about any of this, other Dopers will be along to correct me.)

Ponderoid · August 21, 2022, 6:21pm

Thanks, both of you. Very helpful.

Chronos · August 21, 2022, 8:19pm

Of course, even though the mean is n, in practice, it can take significantly more or less than that. In particular, if you’re rolling until you’ve hit all of the numbers, that’ll usually take a lot longer than n. Whatever the last one is, it’ll seem like the die is biased against that number, but really, something has to be the last one.

IIRC, this is called the “coupon collector problem”.

DPRK · August 21, 2022, 10:44pm

The coupon collector problem is when you add a bunch of them up. For a single number, people usually talk about a geometric distribution. The point is that it has a long tail:

Dr.Strangelove · August 22, 2022, 5:42am

You got the intuitive answers. But here’s the more mathematical breakdown (Doing it the brute-force way. I’m sure there’s something more elegant.).

The expected value of the number of rolls to hit a particular number on a 6-sided die is a sum that looks like this:
E = 1 \frac{1}{6} + 2 \frac{1}{6}\frac{5}{6} + 3 \frac{1}{6}\frac{5}{6}^2 + ...

I.e., 1 times the odds of hitting it on the first roll; 2 times the odds of hitting it on the second, but not the first; and so on.

Generalized and put in summation form:
E = \sum_{k=1}^{\infty} k \frac{1}{N} (\frac{N-1}{N})^{k-1}

Rearrange things a bit:
N E = \sum_{k=1}^{\infty} k (\frac{N-1}{N})^{k-1}

N E = 1 (\frac{N-1}{N})^0 + \sum_{k=2}^{\infty} k (\frac{N-1}{N})^{k-1}

N E = 1 + \sum_{k=2}^{\infty} k (\frac{N-1}{N})^{k-1}

Renumber k:
N E = 1 + \sum_{k=1}^{\infty} (k + 1) (\frac{N-1}{N})^{k}

N E = 1 + \sum_{k=1}^{\infty} k (\frac{N-1}{N})^{k} + \sum_{k=1}^{\infty} 1 (\frac{N-1}{N})^{k}

N E = 1 + \frac{N-1}{N} \sum_{k=1}^{\infty} k (\frac{N-1}{N})^{k-1} + \sum_{k=1}^{\infty} 1 (\frac{N-1}{N})^{k}

Key step here! Recognize that the term in the middle is equal to NE, from way back:
N E = 1 + \frac{N-1}{N} NE + \sum_{k=1}^{\infty} 1 (\frac{N-1}{N})^{k}

N E = 1 + E (N-1) + \sum_{k=1}^{\infty} (\frac{N-1}{N})^{k}

Next key step! The other sum is an ordinary geometric sum, equal to \frac{M}{1-M}
N E = 1 + E (N-1) + \frac{\frac{N-1}{N}}{1- \frac{N-1}{N}}

N E = 1 + E (N-1) + (N - 1)

N E = 1 + EN - E + N - 1

E = N

QED

DPRK · August 22, 2022, 9:59am

If you are going to recognize geometric functions, you may as well note that

\sum_{k\ge0}x^k = \frac{1}{1-x}

so

\frac{d}{dx}\sum_{k\ge0}x^k = \sum_{k\ge0}kx^{k-1} = \frac{1}{(1-x)^2}

and

\sum_{k\ge0}kx^k=\frac{x}{(1-x)^2}.

More generally you can expand (1-x)^{-a}, and there is a lot you can do to manipulate sums using hypergeometric functions.

Dr.Strangelove · August 22, 2022, 10:14am

Nice one. I was wondering how to get that k out in front and didn’t think about using the derivative. That definitely would have saved some steps. Well, at least the proof I gave didn’t require calculus…

DPRK · August 22, 2022, 10:31am

It is a mistake to disregard any kind of calculus or complex analysis when dealing with probability, but in this case what we are dealing with is the binomial series, which may be widely familiar.

Dr.Strangelove · August 22, 2022, 8:04pm

Most certainly. Nothing against calculus of course, but I think it’s still useful to have a purely algebraic answer in case the OP’s abilities stop there. Speaking personally, while I find calculus-based steps like your example are easy to understand, actually using them as a tool when constructing proofs is much more difficult. It takes much more of a proof mindset than I have the practice for.

DPRK · August 22, 2022, 8:22pm

Apologies, especially since I accidentally posted the wrong graph!!! Let me make my own plots;

Let p = 1/6.

This is the geometric distribution.

The y-axis is the probability density. The blue line tends to zero exponentially, though there is still a substantial tail past the expectation which is in this case 6. Also NB only positive integer values of x are relevant here even though I drew a smooth line through all the points.

Chronos · August 22, 2022, 10:13pm

It’s much, much easier to prove. Imagine that you make many, many rolls of the dice (like, say, 6n, in the limit as n → infinity). How many 6s will you get? n, of course, by the definition of probability. And if you have n 6s in 6n rolls, then there must, on average, be 6 rolls in between.

Buck_Godot · August 22, 2022, 10:23pm

Typical dope response to a simple probability question.

Correct one letter answer is given in post 2, followed by 15 posts deconstructing the problem in every way possible. Alas leaving very little left for me to add my own two cents.

God I love this place, at long last I’ve found my tribe.

Dr.Strangelove · August 22, 2022, 10:31pm

That is the intuitive answer that I referenced.

And yet there is something mysterious about it to me. If you consider a long random sequence, the gaps clearly must average 6 between the same number. But if I “join” that sequence at a random place, shouldn’t it average about halfway through any given gap? Why is the number of rolls until the number is hit not half the average gap length?

Andy_L · August 22, 2022, 10:49pm

If you have a sequence that is exactly one six every sixth roll, then if you pick a random starting point, there’s an equal chance that there’s 1, 2, 3, 4, or 5 rolls to the next 6 - and the average of that is 3. But we’re looking at a sequence that averages one six every sixth roll, so if you start at random in that sequence, there’s a chance that’s there’s 1, 2, 3, 4, 5, 6, 7, 8, 9, and so on, rolls before the next six - and the average of that turns out to be 6 (from the arguments above).

Dr.Strangelove · August 22, 2022, 10:56pm

I agree with that, and it’s more or less what I came up with. But this is rather hand-wavey:

there’s a chance that’s there’s 1, 2, 3, 4, 5, 6, 7, 8, 9, and so on, rolls before the next six - and the average of that turns out to be 6 (from the arguments above).

The answer is straightforward mathematically. But it’s not immediately obvious why the weighted average of all possible roll gaps from an arbitrary starting point should equal the average gap length.

Andy_L · August 22, 2022, 11:12pm

Fair enough. It is weird that if you pick an arbitrary startpoint in the middle of a long sequence of rolls, the average number of rolls to the next six is the same as the average number of rolls since the last six.

Dr.Strangelove · August 22, 2022, 11:23pm

Yeah. In my proof above, it is surprising that everything cancels so nicely.

Expanding on this:

Let’s say you have a special die, that always rolls sequentially through a random permutation of numbers. So for instance, it might roll through 3-2-5-4-6-1-3-2-5-4-6-1-etc. Clearly the average gap is 6 and the long-term frequency of each number is 1/6. How do we calculate E in this case? Well, it’s just:
1\frac{1}{6} + 2\frac{1}{5}\frac{5}{6} + 3\frac{1}{4}\frac{4}{5}\frac{5}{6} + 4\frac{1}{3}\frac{3}{4}\frac{4}{5}\frac{5}{6} + 5\frac{1}{2}\frac{2}{3}\frac{3}{4}\frac{4}{5}\frac{5}{6} + 6\frac{1}{1}\frac{1}{2}\frac{2}{3}\frac{3}{4}\frac{4}{5}\frac{5}{6}

Clearly, all that cancels to:
\frac{1}{6} + \frac{2}{6} + \frac{3}{6} + \frac{4}{6} + \frac{5}{6} + \frac{6}{6}

Giving the average of 3.5 that we expect.

What sets that apart from my proof in #7 is that the rolls are not independent, making the tail much shorter (finite, even). The sum is totally different.

So any intuitive proof that doesn’t depend on the independence of die rolls can’t be complete. It’s not enough to just look at gap length and frequency.

Andy_L · August 22, 2022, 11:26pm

Nice. Recognizing which givens are keystones means that you really understand the problem

Dr.Strangelove · August 23, 2022, 5:03am

The simpler derivation of the independent (ordinary die) case, using DPRK’s method. The original sum:
E = \sum_{k=1}^{\infty} k \frac{1}{N} (\frac{N-1}{N})^{k-1}

NE = \sum_{k=1}^{\infty} k (\frac{N-1}{N})^{k-1}

NE = \sum_{k=0}^{\infty} k (\frac{N-1}{N})^{k-1}

Noting this equivalency:
\frac{d}{dx} \sum_{k=0}^{\infty} x^k = \sum_{k=0}^{\infty} kx^{k-1} = \frac{d}{dx} \frac{1}{1 - x} = \frac{1}{(1 - x)^2}

Performing the substitution, assuming x=\frac{N-1}{N}
NE = \frac{1}{(1 - \frac{N-1}{N})^2} = \frac{1}{(\frac{1}{N})^2} = N^2

E = N
\blacksquare

Topic		Replies	Views
mathematical expectation question Factual Questions	8	969	April 7, 2005
Quick stat prob Q Factual Questions	11	887	February 24, 2004
Dice Probability Factual Questions	12	988	September 3, 2002
Is there a name for this statistical phenomenon? Factual Questions	12	1484	October 9, 2014
Die rolling and probability question Factual Questions	6	1085	February 9, 2004

Probability formula for number of rolls to get a particular result

Related topics