I have a personal anecdote that is exactly on point. Big ass technical post follows…
In a former job one of the things my company did was verify the math models of casino games. This included analysis of PRNGs in general and PRNG applications that shuffled decks of cards. Part of this was done by generating millions of shuffles and analyzing those with a battery of tests that determined if the data set was statistically distinguishable from perfectly uniformly random.
A different mathematician was evaluating a blackjack game that used the following algorithm for shuffling blackjack shoes that contained 1, 2, 6, and 8 decks:
- Start with an array [0, 1, … , 50, 51] * (number of decks).
- Randomly choose two distinct indices of this array and swap the elements in those spots. (The underlying PRNG used for this was the Mersenne Twister for those interested.)
- Do step 2 a total of 1000 * (number of decks) times.
This was not a perfectly uniformly random shuffle for a couple reasons, but we could certify for use in the field if it was statistically indistinguishable from an unbiased shuffle.
So this algorithm was passing most of the tests. As an aside, one of the tests was essentially the test that was proposed in the OP. Look at what shows up at every position and see if that matches the expected distribution (uniform in this case).
It was, however, super failing one of the tests. This test was called the longest cycle test. This looked at each shuffle as a permutation in disjoint cycle notation (Seen here), and compared the distribution of the length of the longest cycle of each to the expected distribution of an unbiased shuffle (very very not uniform, fyi).
The manner of the failure was pretty bizarre, and the reason I was called in to the project. Only the single deck implementation was failing and it was failing because it never produced a shuffle with a longest cycle of length 52. In the grand scheme of things, these are relatively rare, but we had a huge data set and were expecting thousands and were instead getting exactly zero.
What was the problem? Permutations come in even and odd varieties. There’s a couple different logically equivalent to define this, but I’ll go with every there is a canonical matrix group representation of every perumutation group. Half the matrices in this group will have a determinant of 1, we’ll call these even, and the other half will have a determinant of -1, which we will call odd (provided we a permuting more than one item). The reason I’m defining things this way is that the determinant of the product of matrices is the product of the determinants of the matrices.
One more fact that we need is that the determinant of a transposition (swapping two elements) is always -1.
So the algorithm always produced a shuffle equivalent to the product of 1,000 transpositions so it had a determinant of (-1)^1000 = 1. While every cycle of length 52 can be written as the product of 51 transpositions so it has a determinant of (-1)^51 = -1. So while the algorithm was pretty good in most ways, it was incapable of producing a specific type of permutation and we lucked out in that a subset of that set was a kind of permutation that we were looking for.
One extra bit of weirdness was that the multi-deck versions of algorithm weren’t failing because sometimes the algorithm we swap a card for a copy of itself which was equivalent to doing nothing. Thus the multi-deck version could produce both even and odd permutations and did so with equal frequency. I’m leaving out some technical details of decomposing multi-deck shuffles into disjoint cycles, but that’s the gist of it.