Random Shuffle - expected distributions

Chronos · November 10, 2019, 1:40pm

OK, I can see how a 25% chance needs less than 2 bits on average, because half the time, you can stop after the first bit, and not need to call the second one at all. But I can’t see how you could do it in less than 1 bit on average, because that means that you’d sometimes be using zero bits, and how do you decide if you’re in the zero-bit case without using any randomness?

Unless you’re referring to a large number of such “biased coin flips” in series, and we’re using extra entropy left over from one of the previous trials, I suppose.

DPRK · November 10, 2019, 2:22pm

If your question is, what if “H” has probability 25% and “T” has probability 75%, and you want to represent a sequence of coin flips using a string of bits, then you can do it using a simple arithmetic encoding.

septimus · November 10, 2019, 4:00pm

+++

The optimal such code will use 0.811278124459132864 bits on average, via a formula attributed to Claude Shannon, or-.25l(.25) - .75l(.75) nats

Arithmetic coding converts biased flips to bits (“50-50 bits”). We need the inverse problem: converting bits to biased flips. As I mentioned above (“in fact barely 0.811 bits are needed”) The library function for that is essentially optimal.

DPRK · November 10, 2019, 4:22pm

Call it arithmetic decoding, then.

ftg · November 10, 2019, 11:01pm

Chronos:

Yes, Quicksort needs n log(n) in all operations, but that’s because the other operations are proportional to the comparisons. Any sorting routine must use at least n log(n) comparisons, because in order to sort a list, you must (directly or indirectly) determine what order it was in originally, which means you must distinguish one of the n! permutations, which means you need n log(n) bits of information about the original arrangement.

And yes, it’s true that comparison isn’t actually a constant-time operation, because it in principle takes longer for larger numbers. But for any practical number size, it’s still far quicker than generating a good random bit. A random bit takes far longer than anything that would ordinarily be considered a “fundamental operation”, and so it’s absurd to measure the complexity of a shuffle as a count of anything other than random bits.

Sorting by comparison requires Ω(n log n) time. Not all sorting methods use comparisons. Radix sort is a famous example. It’s touted as “linear time” however that ignores the length of the keys. If you process m bit keys in blocks of size b then it takes O(n m/b) time. If m/b is really small then it’s linear. (One good use of radix sort is to sort n things whose keys are between 0 and n. So key size is log n and that’ll fit into a 64 bit word, two in really bad cases.) But if the key size is hundreds of bytes then things start looking very non-linear.

Now, about comparisons being non-constant. If you do something like Mergesort and maintaining how “far in” a comparison goes before hitting a mismatch on two consecutive keys in a sublist you can save a lot of comparison time by using that measure to resume comparisons when merging lists. The total comparison time falls quite dramatically.

But the nature of the data Rules All. I would point out to my students that while the name field in the class roster is quite long, there would typically be only a handful of names that agree on two letters and rarely any that agree on 3 letters. So comparisons are fast here. OTOH, a ton of stuff like part numbers tend to have similar prefixes and mostly differ in the last symbols. So radix sort starts off really doing a lot of useful work. But then it starts wasting time, too. So a mixed approach is best.

There are a ton of tricks one can use to tweak sorting that the common undergrad Algorithms class doesn’t cover.

I.e., there is no one “best” sorting method*. One has to be very careful in extrapolating sorting algorithms of one type to another domain.

And you are unreasonably hung up on certain bit operations but not others. Either you count bits on everything (and get absurd results) or you are practical and consider log n and such size word operations that are built in on computers to be constant time (and get sane results). Look at WELL and similar RNGs. A constant number of single word operations to generate the next word.

That goes for “Quicksort”. It’s okay on some things. Nothing special on others. YMMV applies.

Chronos · November 11, 2019, 12:12am

Doesn’t radix sort depend on an assumed distribution of the objects to be sorted? Like, if you were sorting names, but by some fluke you had 50 names starting with A and only ten for all other letters combined (and didn’t know that from the outset), then radix sort would perform poorly.

ftg · November 11, 2019, 3:20pm

A vanilla Radix sort doesn’t care about the distribution. The longer the keys the worse it goes. There are a ton of adaptations to speed things up in certain cases. Buckets and all that. And then the distribution starts to matter.

Jim_Peebles · November 12, 2019, 6:04pm

Check out the research of Persi Diaconis on card shuffling:

(I haven’t read this thread, and have no interest in doing so, but I searched for his name, and was surprised it hasn’t come up.)

DPRK · November 12, 2019, 6:13pm

That paper has been mentioned at least 2 or 3 times in this thread already

Topic		Replies	Views
Optimum number of shuffles to randomize cards? Factual Questions	21	11861	February 17, 2010
Another probability question (not homework!) Factual Questions	28	1905	February 6, 2017
How many shuffles of a deck before it's back in the original order? Factual Questions	62	983	November 20, 2025
Every time you shuffle a deck of cards, the sequence has never existed before Miscellaneous and Personal Stuff I Must Share	73	3435	March 15, 2021
Card Shuffling Factual Questions	22	2061	January 31, 2006

Random Shuffle - expected distributions

Related topics