Random Shuffle - expected distributions

OK, I can see how a 25% chance needs less than 2 bits on average, because half the time, you can stop after the first bit, and not need to call the second one at all. But I can’t see how you could do it in less than 1 bit on average, because that means that you’d sometimes be using zero bits, and how do you decide if you’re in the zero-bit case without using any randomness?

Unless you’re referring to a large number of such “biased coin flips” in series, and we’re using extra entropy left over from one of the previous trials, I suppose.

If your question is, what if “H” has probability 25% and “T” has probability 75%, and you want to represent a sequence of coin flips using a string of bits, then you can do it using a simple arithmetic encoding.

+++

The optimal such code will use 0.811278124459132864 bits on average, via a formula attributed to Claude Shannon, or-.25l(.25) - .75l(.75) nats

Arithmetic coding converts biased flips to bits (“50-50 bits”). We need the inverse problem: converting bits to biased flips. As I mentioned above (“in fact barely 0.811 bits are needed”) The library function for that is essentially optimal.

Call it arithmetic decoding, then.

Sorting by comparison requires Ω(n log n) time. Not all sorting methods use comparisons. Radix sort is a famous example. It’s touted as “linear time” however that ignores the length of the keys. If you process m bit keys in blocks of size b then it takes O(n m/b) time. If m/b is really small then it’s linear. (One good use of radix sort is to sort n things whose keys are between 0 and n. So key size is log n and that’ll fit into a 64 bit word, two in really bad cases.) But if the key size is hundreds of bytes then things start looking very non-linear.

Now, about comparisons being non-constant. If you do something like Mergesort and maintaining how “far in” a comparison goes before hitting a mismatch on two consecutive keys in a sublist you can save a lot of comparison time by using that measure to resume comparisons when merging lists. The total comparison time falls quite dramatically.

But the nature of the data Rules All. I would point out to my students that while the name field in the class roster is quite long, there would typically be only a handful of names that agree on two letters and rarely any that agree on 3 letters. So comparisons are fast here. OTOH, a ton of stuff like part numbers tend to have similar prefixes and mostly differ in the last symbols. So radix sort starts off really doing a lot of useful work. But then it starts wasting time, too. So a mixed approach is best.

There are a ton of tricks one can use to tweak sorting that the common undergrad Algorithms class doesn’t cover.

I.e., there is no one “best” sorting method*. One has to be very careful in extrapolating sorting algorithms of one type to another domain.

And you are unreasonably hung up on certain bit operations but not others. Either you count bits on everything (and get absurd results) or you are practical and consider log n and such size word operations that are built in on computers to be constant time (and get sane results). Look at WELL and similar RNGs. A constant number of single word operations to generate the next word.

  • That goes for “Quicksort”. It’s okay on some things. Nothing special on others. YMMV applies.

Doesn’t radix sort depend on an assumed distribution of the objects to be sorted? Like, if you were sorting names, but by some fluke you had 50 names starting with A and only ten for all other letters combined (and didn’t know that from the outset), then radix sort would perform poorly.

A vanilla Radix sort doesn’t care about the distribution. The longer the keys the worse it goes. There are a ton of adaptations to speed things up in certain cases. Buckets and all that. And then the distribution starts to matter.

Check out the research of Persi Diaconis on card shuffling:

(I haven’t read this thread, and have no interest in doing so, but I searched for his name, and was surprised it hasn’t come up.)

That paper has been mentioned at least 2 or 3 times in this thread already :slight_smile: