Actual words (i.e., a non-uniform probability of collecting each of the 325 “coupons”) were considered in Post #11. Since one would have to estimate these probabilities by reading a dictionary anyway, I’m not sure a Monte Carlo estimate is really cheating compared to an “exact” answer in that case.
Combinatorist, obviously
Obviously n can’t be too small to use that approximation; it should be at least 1000 or so. Perhaps a saddle point method applied to the function in Post #16 will yield a complete asymptotic expansion; do you have a good computer algebra system?
I think you could read all the words into a trie that counts how many words start each way and then traverse the trie and get actual counts for each of the 325 pairings across the whole dictionary.
I don’t think I care enough to code that up, but you could then calculate the recurrence relation exactly in a reasonable time.
It was a software job. I don’t know the specifics.
To clarify: The question I asked in this thread was not the question he was asked to answer in the interview. He was asked a different (simpler, algorithmic rather than combinatoric) question, and as we were discussing that one, this was one of the spinoff “I wonder…”.