Probability formula for number of rolls to get a particular result

If I have an N-sided die and I want to hit a particular number in that range 1-N, what is the formula for the average or expected number of times I’ll have to roll it before I get the one I want?

Short answer: N.

The number of rolls it takes before you (first) get a particular value follows a geometric distribution.

The mean, or expected value, of such a distribution, is 1/p. In this case, assuming all sides of the die are equally likely to come up, p = 1/N, so the mean is 1/(1/N) = N.

(I assume that if I am mistaken about any of this, other Dopers will be along to correct me.)

Thanks, both of you. Very helpful.

Of course, even though the mean is n, in practice, it can take significantly more or less than that. In particular, if you’re rolling until you’ve hit all of the numbers, that’ll usually take a lot longer than n. Whatever the last one is, it’ll seem like the die is biased against that number, but really, something has to be the last one.

IIRC, this is called the “coupon collector problem”.

The coupon collector problem is when you add a bunch of them up. For a single number, people usually talk about a geometric distribution. The point is that it has a long tail:

You got the intuitive answers. But here’s the more mathematical breakdown (Doing it the brute-force way. I’m sure there’s something more elegant.).

The expected value of the number of rolls to hit a particular number on a 6-sided die is a sum that looks like this:
E = 1 \frac{1}{6} + 2 \frac{1}{6}\frac{5}{6} + 3 \frac{1}{6}\frac{5}{6}^2 + ...

I.e., 1 times the odds of hitting it on the first roll; 2 times the odds of hitting it on the second, but not the first; and so on.

Generalized and put in summation form:
E = \sum_{k=1}^{\infty} k \frac{1}{N} (\frac{N-1}{N})^{k-1}

Rearrange things a bit:
N E = \sum_{k=1}^{\infty} k (\frac{N-1}{N})^{k-1}

N E = 1 (\frac{N-1}{N})^0 + \sum_{k=2}^{\infty} k (\frac{N-1}{N})^{k-1}

N E = 1 + \sum_{k=2}^{\infty} k (\frac{N-1}{N})^{k-1}

Renumber k:
N E = 1 + \sum_{k=1}^{\infty} (k + 1) (\frac{N-1}{N})^{k}

N E = 1 + \sum_{k=1}^{\infty} k (\frac{N-1}{N})^{k} + \sum_{k=1}^{\infty} 1 (\frac{N-1}{N})^{k}

N E = 1 + \frac{N-1}{N} \sum_{k=1}^{\infty} k (\frac{N-1}{N})^{k-1} + \sum_{k=1}^{\infty} 1 (\frac{N-1}{N})^{k}

Key step here! Recognize that the term in the middle is equal to NE, from way back:
N E = 1 + \frac{N-1}{N} NE + \sum_{k=1}^{\infty} 1 (\frac{N-1}{N})^{k}

N E = 1 + E (N-1) + \sum_{k=1}^{\infty} (\frac{N-1}{N})^{k}

Next key step! The other sum is an ordinary geometric sum, equal to \frac{M}{1-M}
N E = 1 + E (N-1) + \frac{\frac{N-1}{N}}{1- \frac{N-1}{N}}

N E = 1 + E (N-1) + (N - 1)

N E = 1 + EN - E + N - 1

E = N

QED

If you are going to recognize geometric functions, you may as well note that

\sum_{k\ge0}x^k = \frac{1}{1-x}

so

\frac{d}{dx}\sum_{k\ge0}x^k = \sum_{k\ge0}kx^{k-1} = \frac{1}{(1-x)^2}

and

\sum_{k\ge0}kx^k=\frac{x}{(1-x)^2}.

More generally you can expand (1-x)^{-a}, and there is a lot you can do to manipulate sums using hypergeometric functions.

Nice one. I was wondering how to get that k out in front and didn’t think about using the derivative. That definitely would have saved some steps. Well, at least the proof I gave didn’t require calculus…

It is a mistake to disregard any kind of calculus or complex analysis when dealing with probability, but in this case what we are dealing with is the binomial series, which may be widely familiar.

Most certainly. Nothing against calculus of course, but I think it’s still useful to have a purely algebraic answer in case the OP’s abilities stop there. Speaking personally, while I find calculus-based steps like your example are easy to understand, actually using them as a tool when constructing proofs is much more difficult. It takes much more of a proof mindset than I have the practice for.

Apologies, especially since I accidentally posted the wrong graph!!! Let me make my own plots;

Let p = 1/6.

This is the geometric distribution.

The y-axis is the probability density. The blue line tends to zero exponentially, though there is still a substantial tail past the expectation which is in this case 6. Also NB only positive integer values of x are relevant here even though I drew a smooth line through all the points.

It’s much, much easier to prove. Imagine that you make many, many rolls of the dice (like, say, 6n, in the limit as n → infinity). How many 6s will you get? n, of course, by the definition of probability. And if you have n 6s in 6n rolls, then there must, on average, be 6 rolls in between.

Typical dope response to a simple probability question.

Correct one letter answer is given in post 2, followed by 15 posts deconstructing the problem in every way possible. Alas leaving very little left for me to add my own two cents. :frowning:

God I love this place, at long last I’ve found my tribe. :smile:

That is the intuitive answer that I referenced.

And yet there is something mysterious about it to me. If you consider a long random sequence, the gaps clearly must average 6 between the same number. But if I “join” that sequence at a random place, shouldn’t it average about halfway through any given gap? Why is the number of rolls until the number is hit not half the average gap length?

If you have a sequence that is exactly one six every sixth roll, then if you pick a random starting point, there’s an equal chance that there’s 1, 2, 3, 4, or 5 rolls to the next 6 - and the average of that is 3. But we’re looking at a sequence that averages one six every sixth roll, so if you start at random in that sequence, there’s a chance that’s there’s 1, 2, 3, 4, 5, 6, 7, 8, 9, and so on, rolls before the next six - and the average of that turns out to be 6 (from the arguments above).

I agree with that, and it’s more or less what I came up with. But this is rather hand-wavey:

there’s a chance that’s there’s 1, 2, 3, 4, 5, 6, 7, 8, 9, and so on, rolls before the next six - and the average of that turns out to be 6 (from the arguments above).

The answer is straightforward mathematically. But it’s not immediately obvious why the weighted average of all possible roll gaps from an arbitrary starting point should equal the average gap length.

Fair enough. It is weird that if you pick an arbitrary startpoint in the middle of a long sequence of rolls, the average number of rolls to the next six is the same as the average number of rolls since the last six.

Yeah. In my proof above, it is surprising that everything cancels so nicely.

Expanding on this:

Let’s say you have a special die, that always rolls sequentially through a random permutation of numbers. So for instance, it might roll through 3-2-5-4-6-1-3-2-5-4-6-1-etc. Clearly the average gap is 6 and the long-term frequency of each number is 1/6. How do we calculate E in this case? Well, it’s just:
1\frac{1}{6} + 2\frac{1}{5}\frac{5}{6} + 3\frac{1}{4}\frac{4}{5}\frac{5}{6} + 4\frac{1}{3}\frac{3}{4}\frac{4}{5}\frac{5}{6} + 5\frac{1}{2}\frac{2}{3}\frac{3}{4}\frac{4}{5}\frac{5}{6} + 6\frac{1}{1}\frac{1}{2}\frac{2}{3}\frac{3}{4}\frac{4}{5}\frac{5}{6}

Clearly, all that cancels to:
\frac{1}{6} + \frac{2}{6} + \frac{3}{6} + \frac{4}{6} + \frac{5}{6} + \frac{6}{6}

Giving the average of 3.5 that we expect.

What sets that apart from my proof in #7 is that the rolls are not independent, making the tail much shorter (finite, even). The sum is totally different.

So any intuitive proof that doesn’t depend on the independence of die rolls can’t be complete. It’s not enough to just look at gap length and frequency.

Nice. Recognizing which givens are keystones means that you really understand the problem

The simpler derivation of the independent (ordinary die) case, using DPRK’s method. The original sum:
E = \sum_{k=1}^{\infty} k \frac{1}{N} (\frac{N-1}{N})^{k-1}

NE = \sum_{k=1}^{\infty} k (\frac{N-1}{N})^{k-1}

NE = \sum_{k=0}^{\infty} k (\frac{N-1}{N})^{k-1}

Noting this equivalency:
\frac{d}{dx} \sum_{k=0}^{\infty} x^k = \sum_{k=0}^{\infty} kx^{k-1} = \frac{d}{dx} \frac{1}{1 - x} = \frac{1}{(1 - x)^2}

Performing the substitution, assuming x=\frac{N-1}{N}
NE = \frac{1}{(1 - \frac{N-1}{N})^2} = \frac{1}{(\frac{1}{N})^2} = N^2

E = N
\blacksquare