­xkcd thread

I’m not laughing, but I want to program this now and see what it does to various sequences.

I just forwarded it to my brother-in-law the statistics prof.

Maybe not funny, but definitely fascinating. Something I’ll have to play around with. Can it be proven to always converge?

If you two start arguing I’ll have to call a Mod to mediate.

I’d be shocked if it couldn’t. You can ignore the first application of F and just look at the second and following, so you have only three “random” numbers to think about and not “every possible set of numbers”. There probably needs to be some restrictions on signs though, to avoid breaking the geometric mean.

I played around with some numbers and got Randall’s result. For [1 10 100 1000 10000] as the original set, the convergence is to 308.6637

There’s probably no easy way to explain this joke to someone who has no idea what any of you are talking about, is there?

Probably not :wink:

The “joke”, such as it is, is that arithmetic mean, geometric mean, and median are all different ways to calculate a mathematical average. Randall here is “averaging” the words used to refer to those methods, and creating a mathematical function that’s sort of an average of averages.

I’ll try, though.

F() is a function that calculates three different “averages” of a bunch of numbers.

If you give the three averages to F(), then it calculates three averages of those averages.

Now call it again with the averages of averages and get three average-average-averages.

Lather-rinse-repeat until the three numbers you get back keep being the same.

It’s not particularly useful, it’s just kind of interesting… for small values of “interesting”…

There are many different ways to take a set of numbers and create a single number somehow representative of it. You can take the average (adding all the numbers and dividing by how many numbers there are), you can take the geometric mean (multiply all the numbers and taking the n-th root of the product), and you can find the median (sort all the numbers and take the middle one). For [1 1 2 3 5] the average is 12/5 = 2.4, the geometric mean is the fifth root of 30 (or 1.974) and the median is 2. Randall then takes those three ways of describing [1 1 2 3 5] and takes the average, geometric mean, and median again, and again and again to see what happens. You eventually get to a point where you have three numbers that are equal - which means that the average, geometric mean, and median of those three numbers don’t change. Wheee!

Thanks for explaining. I know about the three kinds of averages and how to calculate them. I just didn’t know the math terminology well enough to recognize that was what that equation was doing. It’s…still not very funny, is it?

It’s funny in the way that watching someone get bogged down in details and losing the big picture is funny - you can imagine someone thinking “I want to describe this data set - but is average, median, or geometric mean the best way? I know - I’ll do all three!”

xkcd’s tagline

I don’t see “funny” in there anywhere.

If it gets people to think about something in a way that they have not thought of before, then he has done his job.

That was the first thing I did!

Gmdn(-1,2,3) ~ 1.443218 + i 0.402679
Gmdn(1,-2,3) ~ 0.946341 + i 0.983653
Gmdn(1,2,-3) = 0

It’s not particularly funny, but it is interesting.

Proof of convergence is easy: All three of those “averages” are guaranteed to return a number greater than the least input and smaller than the greatest input, so the range from least to greatest will always be shrinking. And it’s not too hard to quantify how it shrinks, so you won’t get a situation where you’re asymptotically approaching some nonzero range.

At least, if you’re starting with positive values. Including negatives ends up dealing with complex numbers, as @Pleonast found. How are you taking the median of a set of complex numbers? Like, what’s the median of -2, (1+sqrt(-3)), (1-sqrt(-3))?

Good question. Some software might use the median real part, and some might output the median absolute value

It’s worse than that, even, because with either method of ordering complex numbers, you can have two different numbers with the same ordering. Do you pick one of those two as the median (and if so, which one?), or take the arithmetic mean between all of the tied points?

Looks like MATLAB sorts a vector by absolute value, and takes the center point (or for an even number of points, averages the two centermost points), which means that you can get some surprising results. Median of (i 1 -1) is i, median of (-1, i, -1) is -1 - but median of (-1, i, -1 1) is (i-1)/2

I was using the default numpy median function which uses a lexical sort. That is, it sorts complex numbers by real component, then imaginary component. For your example, the sorted list would be [-2, 1-sqrt(-3), 1+sqrt(-3)].

While easy to implement, I find that distasteful. For example, the median of [-1, i 100, 1] is i 100. So I implemented by own median function which sorts by magnitude, then real component, then imaginary component.

That changes previous calculations to
Gmdn(-1,2,3) ~ 1.423165 + i 0.831516
Gmdn(1,-2,3) ~ 0.927599 + i 0.308025
Gmdn(1,2,-3) = 0

Not a big difference, actually.