Probability of getting items 1-N when selecting from population of 1-M (M>N)

Actual words (i.e., a non-uniform probability of collecting each of the 325 “coupons”) were considered in Post #11. Since one would have to estimate these probabilities by reading a dictionary anyway, I’m not sure a Monte Carlo estimate is really cheating compared to an “exact” answer in that case.

Combinatorist, obviously :slight_smile:

Obviously n can’t be too small to use that approximation; it should be at least 1000 or so. Perhaps a saddle point method applied to the function in Post #16 will yield a complete asymptotic expansion; do you have a good computer algebra system?

I think you could read all the words into a trie that counts how many words start each way and then traverse the trie and get actual counts for each of the 325 pairings across the whole dictionary.

I don’t think I care enough to code that up, but you could then calculate the recurrence relation exactly in a reasonable time.

It was a software job. I don’t know the specifics.

To clarify: The question I asked in this thread was not the question he was asked to answer in the interview. He was asked a different (simpler, algorithmic rather than combinatoric) question, and as we were discussing that one, this was one of the spinoff “I wonder…”.