Suppose that I have a fair n-sided die, and I want to roll it until I get all n outcomes. This is probably going to take more than n rolls, because I’m probably going to get some repeats. For instance, if n = 10, it’ll take, on average, over 29 rolls in order to get all of them.
But this means that there will probably be a significant span of rolls where I’ve gotten nine out of my ten outcomes, and am diligently pounding away at that last one. And even once I do get it, I’ll have only gotten that one once, while my other rolls will average around three times each. So it’s going to seem to me like that last outcome, whatever it is, is really rare, while the others will seem fairly common.
It’s only an illusion, of course, and it varies from person to person. Like, I might say to someone attempting the same task “Man, trying to get a three takes forever, doesn’t it!?”, to which they might reply “What do you mean? I’m always getting threes. It’s sevens that are the rare one.”. In reality, of course, neither is actually rare: All rolls are always equally likely.
You know that all results are equally likely but are mixing up experience of past rolls (which have no effect on the future odds) with the “rarity” of waiting to get your last number. By definition, every time you miss you will have loaded up on those you already have making them seem “common” and your missing number “rare”.
I think it’s more of just realizing p(x)>0 means that there is possibility that the event will occur. For example rolling doubles is 1/6 so rolling doubles 10 times is less than 1.7 x 10^-8 but this means it can occur. So when it does and gets noticed as unusual (rightfully and wrongfully) I would classify it as confirmation bias.
Isn’t the Gambler’s fallacy the opposite of the OPs illusion. The gambler actually acknowledges all outcomes should be equally likely, but mistakenly thinks that previous events should influence future events, so that the “hard to get” number should now crop up more often to balance things.
The OP on the other hand describes a different, but equally problematic result of trying to do intuitive statistics, trying to establish the probabilities in an event with a stopping condition that inherently makes the results biased.
If there are situations where this activity occurs, there may be a name for it. Is there any activity other than idle dice tossing and bad statistics where you’d try throwing a die until you get all possible results?
I was interested in the relationship between n and E(n), the expected requirement. It’s easy to find a recurrence relation (e.g., E(10) = 7381/252 = 29.28968…) but I didn’t find a closed form. The sequence 2!E(2), 3!E(3), 4!E(4) is not in Sloane’s on-line Encyclopedia.
Asymptotically, I think E ~= n*ln(n). Surely this problem is well known?
This is correct. It comes from the harmonic series: The general form is E(n) = Sum(i=1…n) n/i. This is because on your first roll, you’re guaranteed to get a new result: Your expected number of rolls to get a new result is 1 (or n/n). Once you’ve gotten a single result, the probability of getting another new one is (n-1)/n, and so your expected number of rolls to get your second unique result is n/(n-1). For your third one, the probability of a new result is (n-2)/n for an expectation of n/(n-2) rolls, and so on, until when you only have one left, your chance of success on each roll is 1/n, and so your expected number of rolls to get that last one is n.
And the specific case that prompted my interest was a set of achievements in a computer game. For instance, in one spot, there’s (sometimes) a device that shows up in the game that can give one of ten possible outcomes, and there’s an achievement for having encountered all ten (since this device often doesn’t show up at all, and is in a very large area that takes a long time to search, this is a tedious slog). And there’s also one for finding all of a set of randomly-occurring enemies (though this one isn’t quite the same situation, since some of those enemies really are rarer than others, but the effect would still show up).
Surely it is! Chronos gave a perfect explanation of why E(n) is precisely n * the nth harmonic number, but in case you care to read up more, this generally goes by the name “coupon-collector problem”.