Perfect Pairs: A Possibly Amazingly Hard Math Question

No, I’m not taking about the Sesame Street sequence.

Quite awhile back my father asked by best friend, a programmer, and I, not a programmer but a pretty smart guy, to design a computer program that would allow for a perfect ranking of X choices (he was thinking of candidates for a job promotion but that doesn’t matter) by presenting a person with a stream of paired choices. The program must ask enough paired questions to know enough to rank every choice, in order, and cannot allow a conflicting choice.

So, to use just 4 choices of A, B, C or D, the computer might ask:

A or B? I choose B.
C or D? I choose C.
A or C? I choose A.

The computer can now rank them B-A-C-D, since we know B is preferred to A, A to C, and C to D.

If that seems simple, brother, it ain’t. You can make this complicated:

A over B
C over D
B over D
But you don’t know if it’s C over B or B over C so now you have to ask that too.

So it might take four picks. And getting the computer to grasp this and not have it accidentally ask you a question that could contradict a necessary ranking… and this is for four choices. Imagine it for 24.

Well, we couldn’t figure it out for the life of us. Just could not put together a logarithm that would explain it. And yet it seems like it should be so easy. Is this as hard as it looks?

This almost has to be a joke, right?

Every sorting algorithm works on exactly this scheme.

Try looking up “bubble sort” or “quick sort” for the details.

Arrow’s Paradox.

Technically, that applies only to groups rather than individuals. Rankings have a series of deep results like this, though. And that’s assuming an individual with a perfectly consistent ranking scale. In the real world, people often prefer A to C and F to G but when asked G to C come up with random answers.

In the real world, bubble sorts are inefficient for large sets as well, but I can’t tell whether you’re asking for real world or theoretic information.

A simple recursive algorithm would go something like this:

Take all the people to rank in a random order.

Let’s say the real order is alphabetical and you have them in this order:

D G A F E C J B K H I

so take the one in the middle (in this case C)

now ask for each other one compared to C

you’ll end up with

A B C D G F E J K H I

where you know A and B are before C and the rest are after C but you don’t know the rankings within those.

So you recursively do the same thing on the two smaller sets.

For a set of just 2, you can ask for the 2 and you’re done. For a set of just 3, you can ask each of the 3 pair possibles and you’re done. For more than that, pick a middle one (or if it’s an even number, the first of the 2 middle ones) and proceed as above.

In this case you’d pick E as the middle one in the bigger than 3 list you have left, and end up with:

D E G F J K H I

then pick J and end up with

G F H I J K

then pick F and end up with

F G H I

then you’re down to one of your trivial cases.

I forget the name of this algorithm, a split sort maybe?

Isn’t this just a problem requiring a sorting algorithm? Specifically, a comparison sort? I thought that algorithms for this were fairly well-known, so maybe I’m missing something here.

Well, I think I found at least part of your problem :slight_smile:

But yea, assuming I’m reading the OP right, its a pretty common problem in computer science and there are a large number of algorithims that can solve it with varying efficiencies. Your best friend needs to brush up on his numerical methods.

It’s not clear to me exactly what problem OP is trying to solve, but it might be the Oblivious Sort with Minimal Number of Comparators problem, also discussed here and here, which is intriguingly difficult.

The 2nd of the 3 links above mentions that Daniel Hillis addressed the problem with Genetic Algorithm and that "Even a 25-year old result for the upper bound on the number of comparators for the 13-input case has been improved by one comparator! "

Maybe I’m misunderstanding, but I think he’s asking for an algorithm to always get the most efficient sort, without any extra comparisons. The answers in this thread seem to only be giving various sorting methods that vary in efficiency depending on the data. None of them determine ahead of time what sorting algorithm is most efficient.

EDIT: Argh, septimus!

As I’m reading it, the problem in the OP is this, except that you get to see the outcomes of the first k comparisons before you have to decide what the (k + 1)st is. Given that, I’m having a lot of trouble seeing why this isn’t just an ordinary sorting problem.

But that problem is much less interesting than the Oblivious Sort problem. :wink:

Though still non-trivial. N=5 seems like it might always be solvable with 7 comparisons. (Since 5! < 2[sup]7[/sup]) but, IIRC(*), 8 comparisons are sometimes needed.

(* - and BTW I did use to recall such things correctly with some regularity, but less lately, partly due to apathy and partly due to creeping senility.)

As everyone said, this is the classic computer science problem of sorting an arbitrary linearly ordered type, one of the first discussions had in any algorithms class.

But in particular, I am confused that the OP thinks this would be complicated. If you don’t care about maximizing efficiency, it’s not tricky at all:

First, compare B to A. Then write out the result, whatever it is: A < B or B < A.

Next, compare C to the lowest thing you have. If C is lower, then you’re done with C. Otherwise, compare C to the next lowest thing you have. Keep going till you figure out C’s place.

Then, compare D to the lowest thing you have, then the second lowest, then the third lowest, etc., stopping once you find out D’s place. And so on and so on.

This is called insertion sort. It’s not the most efficient sorting algorithm (its average- and worst-case behavior are both what’s called Theta(n^2), meaning it takes time proportional to the square of the number of items, while better sorting algorithms are what’s called Theta(n log n)), but it’s a very straightforward approach to solving the problem. There’s no need to consider the problem complicated.

But I think that the OP does want the most efficient and that it is dynamic. So for example if I know A>B, C>D, E>F what should the next comparison be? Lets say B and C, If B>C, what next? If B<C, what next?

No matter how you approach it, this is a comparison-based sort, so your worst case is never better than O(n*log(n)). As a result, putting the items in some order and doing merge sort is going to be pretty much impossible to beat.

Am I the only one who [del]hoped[/del] thought this was going to be about women’s breasts?

Well, quicksort is usually somewhat faster than mergesort (though not by much), and there are a bunch of other algorithms that are competitive with either. It’s not impossible to beat mergesort, just impossible to beat it by a significant margin.

Well, unless you use linear bogosort, but that has a rather high overhead, since one of the steps in the algorithm is “destroy the Universe”.

Ah, yes.

import universe U;
~ATH(U) {
}
EXECUTE(52CardPickup)

Too obscure?

Well, what do you mean by “the most efficient and that it is dynamic”?

The one concern the OP outlined is “Never ask a question to which you already know the answer.” Insertion sort satisfies that criterion. It never asks a redundant question, in that sense.

Another concern one might have is for asymptotic worst-case efficiency; if that’s the concern, insertion sort is not suitable, but any Theta(n log n) worst-case algorithm will do.

Another concern is “Devise an algorithm which, for every input, asks the least number of questions possible (for that input) to know how to put the input in order”. But this is impossible! For any input of length n, if you happen to ask the right n - 1 questions, you could put it in order after those n - 1 questions (if, by random happenstance, you happen to first compare the lowest and second lowest, then the second lowest and third lowest, and so on). But there’s no algorithm which sorts all inputs with just n - 1 questions (n - 1 questions could only produce 2^(n - 1) possible responses; however, there are n! possible orderings, so once n > 2, this isn’t enough questions to cover every case). So this is not an achievable goal.

For example, let’s look at the case with just three inputs.

No matter what, you have to start by picking two of them and comparing them. They might as well be the first two; every other choice is essentially the same.

Then, you have to pick the remaining one and compare it to one of the first two. It might as well be compared to the higher of the first two; comparing it to the lower would be essentially the same algorithm, just viewed through a mirror.

Either the third item is greater than the higher of the first two and you’re done (taking 2 comparisons in these 2 out of 6 possible cases) or the third item is lower than the higher of the first two and you must now compare it to the lower of the first two to be done (taking 3 comparisons in these 4 out of 6 possible cases).

Every sorting algorithm for 3 items which does not ask redundant questions is essentially the same. In 2 cases, it takes 2 comparisons, and in 4 cases, it takes 3 comparisons.

So every sorting algorithm for 3 items asks “more questions than necessary”, in some sense, for 4 out of 6 possible inputs.

There’s no meaningful answer to the question “What should the first comparison be?”. No matter what comparison you make first, it will sometimes be less efficient than another comparison would have been. And similarly for “Given that A < B, what comparison should I make next?”, and such.

So it’s not clear to me what is meant by the claims that the OP wants the “most efficient” algorithm in such a sense. And, indeed, it’s not clear to me that the OP wanted any more efficiency than the one thing they stated explicitly, “Do not ask questions to which you can already determine the answer”, a criterion already met even by lowly insertion sort, despite its less-than-optimal asymptotic behavior, and for that matter, easily met by any approach to sorting augmented with the appropriate bookkeeping.