If an ordered array is randomly shuffled, my possibly faulty understanding is that the elements should have roughly equal probabilities of ending up anywhere in the shuffled array. For simplicity, say I define the initial ordered array A with 100 elements such that a[sub]i[/sub] = i for i = 1 to 100, randomly shuffle it and then note that the element i (which was initially in position i) is now in position j. If I then define a 100 x 100 element distribution array D with all elements initially set to 0, and each time I randomly shuffle my initially ordered array A I increment the elements d[sub]ij[/sub] where the initial element i in A ends up in position j, should I expect that after some reasonably large number of random shuffles the elements of D should all be approximately equal? My initial guess is that any element i in A is less likely to remain in position i after a random shuffle so I would expect that the values along the diagonal of D should be lower than the rest, but on the other hand if the shuffle is truly random, why wouldn’t an element occasionally remain in place?
Note that I’m not asking for how to perform a random shuffle, just trying to wrap my head around what a random shuffle actually means.
What it means… you can visualize what happens to a deck of cards (could be a small one, like with 4, if you want to enumerate all the distinct possibilities).
By symmetry, any card has an equal chance of ending up in any position. If you count where it ends up each time, the counts will not be exactly the same, but after a short while you should see them form a uniform distribution with few significant deviations from the expected number.
It’s not completely clear how OP defines random shuffle. A single riffle shuffle of a brand-new deck of cards will leave much order intact. Riffle the deck a few dozen times, using multiple riffle methods, and after all that you may approximate a random uniform shuffle, where each of the 52! possible orderings is expected to occur with 1/52! probability.
When shuffling the deck (#1, #2, …, #52) via computer there is a standard algorithm which almost every programmer knows:
Swap #1 with #j (j uniform random on {1, 2, 3, …, 52})
Swap #2 with #j (j uniform random on {2, 3, 4, …, 52})
Swap #3 with #j (j uniform random on {3, 4, 5, …, 52})
…
Swap #51 with #j (j uniform random on {51, 52})
Each of the 52! orderings will occur with 1/52! probability if the “uniform randoms” are reliable. Note that 1/52 of the time you swap #1 with itself (i.e. do nothing) and so on.
“Almost every programmer” may have been an over-statement. Many get it wrong, including one on-line casino — sorry, I’m not going to try Google — which used a trivial bad shuffle and got exploited for it! :eek:
As an aside, when you shuffle a list, on average, you’ll expect exactly one element to end up in its original spot. And this is true regardless of the size of the list.
I must be interpreting this incorrectly. Looking at small sets, for the set with 1 element, exactly 1 element ends up in its original spot 100% of the time. For 2 elements, exactly 1 element ends up in its original spot 0% of the time (because either both elements will be in the same spot, or neither will). For 3 elements, half the possible orderings after a shuffle result in exactly one element in its original spot. For 4 elements, I make it 8 out of 24 orderings that have exactly one element in its original spot. What am I missing here? Does it only work for sets with an odd number of elements? That’s in line with what I would expect intuitively. But intuition is often extremely misleading when it comes to probability and statistics.
It has been shown that 7 random riffle shuffles of a 52 card deck will leave it within 1% of a perfectly random distribution. What is a random riffle shuffle? Divide the deck in two at a random place then randomly choose between the two packs and take the top card from the chosen pack. If you google “Trailing the riffle shuffle to its lair” you will get a pdf of the paper by D. Bayer and P. Diaconis that establishes this result. Warning: heavy going, including a detour into the cohomology of commutative algebras in which an animal called the shuffle idempotent is described.
Chronos meant that, averaging over a bunch of shuffles, you’d expect an average of one element in its original spot per shuffle.
So, for two elements, half the time they’ll move (zero elements in the original spot), and half the time they won’t (two elements in the original spot). So the average number of elements in the same spot (over lots of shuffles) is ( 0.5 x 0 + 0.5 x 2=) one.
Likewise for any other list with more than one item: sometimes you’ll get no elements in the original spot, sometimes you’ll get more than one, but over lots of shuffles the average should be close to one element in its original spot.
Chronos: “you’ll expect exactly one element to end up in its original spot.”
Chronos’ usage of “expect” may be somewhat technical. With two elements, you’ll get two stay-at-homes half the time, and zero the other half the time. You expect that exactly one element in its original spot never happens! But on average there will be one stay-at-home. (50%×2 + 50%×0 = 1.)
With four elements you’ll get four stay-at-homes 1 time in 24, two 6 times in 24, one 8 times in 24, and zero 9 times in 24. Again the average is one.
ETA: Ninja’ed by Quercus.
You greatly overestimate the skills of typical programmers. Most given this task would do like you did but select from 1…52 each time. That means that there are 52[sup]52[/sup] outcomes which is not a multiple of 52!.
When I was CS prof I saw this all over the place. Not just with students but sample code posted on the Net. A USENET posting for a permutation generator was almost certainly to be bad. But at least there’d soon be a follow up pointing out the error. (Which wasn’t alway believed. “Hey, it looks random.” Sigh.)
Yes; my “‘Almost every programmer’ may have been an over-statement” was itself a tongue-in-cheek understatement.
Think about the fact I mentioned: An on-line casino, with huge incentive to get this correct, managed to come up with the same wrong code as those CS drop-outs. The mind boggles. (And there are other simple facts about working with random numbers that are often overlooked.)
ETA: Referring to them as “CS drop-outs” "was another misstatement on my part. Some are doubtless earning high salaries as software managers at Microsoft.
If you have N elements in your array, on each shuffle, d[sub]ij[/sub] will be 1 with probability 1/N and zero otherwise. This is true regardless of whether i and j are equal, there will be some correlations, involved since if d[sub]ij[/sub] =1 then d[sub]ik[/sub] is necessarily 0 if j is not equal to k. But each individual count will work according to that probability.
If you repeat the experiment R times, and add up the d[sub]ij[/sub]'s then each sum will follow a binomial distribution. On average each one will have a value equal to about R/N, plus or minus sqrt(R*(N-1))/N.
Um, it’s not a uniform distribution. Some permutations occur more than others. It is highly unlikely that this non-uniform distribution is more desirable than the uniform one and it is trivial to modify the code to get a uniform one.
Example: with 3 elements you get will generate one of 3[sup]3[/sup]=27 outcomes, and there are 3!=6 permutations. 27/6 is 4.4. So some permutations occur 4 times in the outcomes and some occur 5 times. If you do it right all appear the same number of times. In this case once.
Not a “uniform” distribution, of course; let’s call it a bell-shaped binomial distribution. Sorry for the typo. What is uniform is the probability of ending up in position i (it is always 1/100 in your case )
There is a good youtube video with P. Diaconis discussing this paper. Numberphile has 6 or so videos with Diaconis discussing coin flips dice and things of that nature.
Thanks to everyone who has replied so far, especially septimus and Chronos. I may not have been clear in my description of the distribution array, it holds a cumulative count of the outcomes of each shuffle, in reply to Buck Godot.
The reason I posted this was that I needed to code up a random shuffle, found the Fisher-Yates algorithm and Durstenfeld’s implementation thanks to wikipedia and coded it up. I did exactly as I described in my original post as a brute-force check that I was getting the distribution that I thought I should but the cumulative values in the top left corner of the distribution (for a start to end shuffle) were all about half of the expected values, and when I switched to an end to start shuffle, the bottom right corner values likewise were about half of the expected values.
As it turns out, I had made a glaringly obvious (in hindsight) error in my code, instead of selecting the swap index j in the range of i <= j <= N I was restricting j to be greater than (and not equal) to i.
Yes, to clarify, I was using “expected value” in the technical sense of the mean, even though (as often happens with means) sometimes the “expected” value is itself impossible. Just like the average human has approximately one testicle and one ovary.
And “exactly 1” was a statement about that mathematically-defined expected value. 1 isn’t just a good approximation for the expected value, or valid for sufficiently large N, or the like: For any deck size, large or small, it’s always exactly 1.