Random Shuffle - expected distributions

Lance_Turbo · November 8, 2019, 3:46am

I get what you’re saying now. One could put some effort into more efficiently using random bits. I’ve never really thought about the problem or had a problem where random number generation was a bottleneck.

If anyone is interested in this problem it looks like the paper Fast Random Integer Generation in an Interval by Daniel Lemire might be a good place to start. The blog post Efficiently Generating a Number in a Range by M.E. O’Neill found Lemire’s method to be the fastest.

septimus · November 8, 2019, 4:27am

AFAIK, DSM-VI doesn’t even have a code for my pathology.

… And based on the date of your cite, I’m happy to conclude that Lemire’s method is based on my own publications on another message board!

Dead_Cat · November 8, 2019, 11:32am

Thanks for this; I was going to ask what the practical implications might be, and whether this would even be detectable by a human. I somehow doubt it.

I also don’t think a player would be in a position to take advantage often enough even if they knew about the flaw, unless the program also allowed nearly all the cards to be used before a shuffle.

Chronos · November 8, 2019, 12:02pm

Keeping track of the parity of a permutation isn’t something that would be obvious to most people, but it is something that a human could do without too much difficulty if they learned how (about the same level of difficulty as counting cards).

Chronos · November 8, 2019, 12:20pm

Oh, and what if you take a Quicksort, but replace all of the comparisons with random selections? That should use the right number of bits-- Is it a valid shuffle?

(I think this is equivalent to asking whether Quicksort is optimally efficient)

Come to think of it, it can’t be perfect, because Quicksort has a bounded worst-case performance, but a random selection between any number of alternatives that’s not a power of 2 has an unbounded worst case. But is it good enough?

septimus · November 8, 2019, 12:41pm

Any oblivious procedure using 50-50 coin tosses will have a number of equally likely outcomes equal to a power of two (i.e. some 2[sup]n[/sup]). This is (conceptually) true even if some sequences allow premature completion: omitting the final k tosses is the same as treating 2[sup]k[/sup] cases identically.

And, as ftg implied, such a set of size 2[sup]n[/sup] cannot possibly provide a perfectly perfect shuffle! Because (52!) will not divide 2[sup]n[/sup] evenly no matter how large n is.
In practice, computer PRNG’s are (almost?) always a sequence of 50-50 bits. So, a perfect perfect shuffle is, strictly speaking, impossible to guarantee in finite time! In practice this is not a problem but — given that simple sentences like “How many random bits do they spend to shuffle a deck? I spend about 7.05*32 bits, in close agreement with log2(52!)” turned out to be hard-to-understand :smack: — I’ll just let you take this on faith.

Chronos · November 8, 2019, 8:42pm

Yes, and that’s basically what I realized in my third paragraph. Quicksort has a maximum number of comparisons it will ever need to make, but a perfect shuffle algorithm does not have a maximum number of random bits that’s guaranteed to be enough, ergo, randomized Quicksort cannot be a perfect shuffle algorithm.

ftg · November 8, 2019, 8:55pm

Quicksort is Ω(n log n). The algorithm given by septimus back in the 3rd post is linear time. Basically as fast as one can get. It’s quick and dirty code without the dirt. Anything else is pretty much going to be slower and take longer to code.

The only real reason to discuss other methods concerns what methods use fewer random bits or are better if you don’t completely trust your RNG.

I just cannot fathom that there are companies out there who pay people whose brains come up with idiocies like “Let’s do a 1000 pair-wise random swaps.” It’s not just flawed, it’s a waste of computer and people time. (Remember: the maintenance cost of code is far greater than the original development cost. Doing stupid stuff makes the maintenance cost a lot greater. And in this particular case ridiculously worse.)

Chronos · November 8, 2019, 9:24pm

septimus’s algorithm there is also n log n in number of bits required. Remember, choosing a random number from 1 to n requires more bits (on average) for a larger n. And the bound that a sorting algorithm must be at least n log n is based on the same thing we’re talking about here, the number of bits needed to completely represent an n-element permutation.

septimus · November 8, 2019, 11:47pm

More precisely, and quite close for largish n, (n+.5) * log[sub]2/sub + (.919 - n)*1.442695 bits of random entropy are consumed for an entropy-minimal n-shuffle, where 1.442695 is approx log[sub]2/sub.

DPRK · November 9, 2019, 3:44am

If I am remembering correctly posts from people in the industry, slot machines and digital casinos have to pass all kinds of inspection and certification, but having a source of truly random numbers is not a requirement.

Q: is it a requirement to be able to generate every possible permutation of even a single 52-card deck?

Lance_Turbo · November 9, 2019, 4:31am

I worked for a company that was a trusted third party between casino game manufacturers and gambling regulators. We were an international company that did work in pretty much every gambling jurisdiction in the world. There is no jurisdiction that I know of where a pseudo-random number generator would not suffice.

I did twice have to test hardware RNGs. One was essentially a clever application of a Geiger counter. That thing was slow as fuck and actually failed on first pass. It didn’t fail because the source of randomness though. They were introducing detectable bias due to improper scaling.

The other was a pretty cool device that relied on the behavior of photons as governed by quantum mechanics. That thing was pretty sweet.

It is generally a requirement in every jurisdiction that every advertised outcome be possible. For card games there was often additional language that games needed to be faithfully simulating what would be expected if the game had been played with actual cards. We interpreted this to mean that every shuffle must be possible, but we never actually checked that because it would be computationally impossible. We did however fail games when we could prove that particular shuffles could not be produced as I alluded to in my earlier anecdote.

ftg · November 9, 2019, 1:15pm

You are confusing two different aspects of Complexity Theory. Quicksort analysis counts number of major operations. (Comparisons, swapping things, etc.)

If you count the number of bit operations then just a simple loop that counts from 1 to n takes O(n log n) time. We do not do this in Algorithm Analysis as long as the size of words are small. And log n is definitely considered small. After all, a 64-bit word size on a computer means you can do additions and such in constant time on numbers up to a bit over 4 billion. Go to double word size and a counter run from 0 to 2^128 would take an almost unimaginable amount of time.

If the algorithm actually requires long integers or some such, then counting bits matters. (E.g., prime factoring.)

There’s a lot going on in Algorithm Analysis to keep things in the proper perspective that a non-expert wouldn’t understand. But let me assure you that your claim about needing n log n random bits immediately requires n log n time is not at all how anyone knowledgeable in the field would see things. Again, it is no different than claiming that a simple for loop with no body takes n log n time.

Chronos · November 9, 2019, 5:04pm

Yes, Quicksort needs n log(n) in all operations, but that’s because the other operations are proportional to the comparisons. Any sorting routine must use at least n log(n) comparisons, because in order to sort a list, you must (directly or indirectly) determine what order it was in originally, which means you must distinguish one of the n! permutations, which means you need n log(n) bits of information about the original arrangement.

And yes, it’s true that comparison isn’t actually a constant-time operation, because it in principle takes longer for larger numbers. But for any practical number size, it’s still far quicker than generating a good random bit. A random bit takes far longer than anything that would ordinarily be considered a “fundamental operation”, and so it’s absurd to measure the complexity of a shuffle as a count of anything other than random bits.

Lance_Turbo · November 9, 2019, 7:29pm

Some Python 3 code…


from random import shuffle
import datetime

def shuffle_time_trial(n, trials):
    my_list = list(range(n))
    start = datetime.datetime.now()
    for t in range(trials):
        shuffle(my_list)    
    print(n, trials, datetime.datetime.now() - start)
    
trials = 100000
for n in range(100, 600, 100):
    shuffle_time_trial(n, trials)

It outputs…


100 100000 0:00:07.136390
200 100000 0:00:14.232014
300 100000 0:00:21.538563
400 100000 0:00:29.174170
500 100000 0:00:36.412494

This is not surprising since each shuffle calls the RNG n - 1 times.

Each call to the RNG produces 64 random bits. septimus pointed out that this is far more random bits than is absolutely necessary and suggests that he has a method that, on average, needs just a touch over log[sub]2[/sub] n! bits which is a lot less than 64 * (n-1) for all reasonable values of n (note: it does eventually surpass the linear method at values well above the number of things we might want to shuffle, it starts catching up at 2^64).

Chronos · November 9, 2019, 9:36pm

There is no linear method. You’re assuming that a routine that gives n random bits is always the same thing, no matter the value of n.

Lance_Turbo · November 10, 2019, 1:03am

The code I posted generates 64 * (n - 1) random bits. That is indeed linear with respect to n.

septimus · November 10, 2019, 4:49am

Miscellaneous remarks:

Complexity quotes can get weird when you try to be very precise. I recall a computational geometry paper (addressing a problem like taking an awkwardly shaped piano through a series of doorways) where inside the O(·), in addition to the usual n and log n factors, was A[sup]-1/sup — the inverse Ackermann function! (Not the ridiculously rapid-growing Ackermann. The inverse Ackermann which plods along far more slowly than any imaginable tortoise!)

Even very high quality PRNGs are very fast. Hardware RNGs may be much more expensive. (How do the applications which think they need “true” randoms work anyway? I’d guess that, if speed is an issue, they use the hardware randoms as seeds for a software PRNG.)

I’m not sure what Lance’s demo is intended to prove. Of course a procedure that requires O(n · log n) time can be presented as O(k · n) if we choose some k > log max_n and experiment only with n < max_n!

OTOH, I’m happy to admit that my own random library, which minimizes random-bit consumption, was written just for fun. Fun is good!

BTW, another common use for randomness is ’ { return 1 with probability p, otherwise 0 } ’ Most codes will consume 32 (or 64) bits for this, although only 1 bit is needed when p is exactly 0.5. If we’re thrifty with our random bits we can consume less than 1 bit when p ≠ 0.5 ! For example, suppose p = 0.25 exactly or p = 0.75. There’s a trivial way to ‘toss such a biased coin’ while consuming exactly 2 random bits; in fact barely 0.811 bits are needed. Coding for this is, again, unnecessary, but Fun!

Finally, looking over my own random library just now, I see a puzzle: Pick a random point uniformly distributed on an n-dimensional ball (or an n-dimensional sphere — take your pick). Assume n may be large. No fair Googling? Please start a new MPSIMS thread to present any solution.

septimus · November 10, 2019, 4:57am

This should be
If we’re thrifty with our random bits we can consume less than 1 bit (on average) whenever p ≠ 0.5 !
Damn that 5-second Edit window!

Lance_Turbo · November 10, 2019, 5:48am

That is exactly what it is intended to show.

Topic		Replies	Views
Optimum number of shuffles to randomize cards? Factual Questions	21	11861	February 17, 2010
Another probability question (not homework!) Factual Questions	28	1905	February 6, 2017
How many shuffles of a deck before it's back in the original order? Factual Questions	62	983	November 20, 2025
Every time you shuffle a deck of cards, the sequence has never existed before Miscellaneous and Personal Stuff I Must Share	73	3435	March 15, 2021
Card Shuffling Factual Questions	22	2061	January 31, 2006

Random Shuffle - expected distributions

Related topics