Applied math question - "optimizing" a matrix

“Optimizing” is the best description I have for what I want to do. I’m trying to set up a system for awarding prizes in a competition where each team gets at most one trophy. Trophies are awared one per category.

There are N categories (columns) and M teams (rows). Each team receives a score in each category, resulting in a M x N matrix. Since each team can have at most one trophy, I can’t just pick the highest score in each category–there would be multiplicity. And since there can only be one winner in each category, i need a way of arranging the matrix so that I’m taking the best value from each column without taking more than one from any row.

The way i thought to do this is to re-order the rows so that a vector taken from the diagonal has the maximum magnitude (norm). If M and N are not equal, I’ll just augment with zeros to make a square matrix.

My questions are:

– Is there a standard matrix manipulation technique which will maximize or otherwise optimize the diagonal while leaving the rows intact?

– Is maximizing really what I want to do, or will there be eigen-somethings involved?

This is something I want to write a simple computer program or Excel macro for, so it needs to be applicable to any generic matrix.

No “eigen-somethings” involved at all since this isn’t the matrix of a linear transformation. It’s not really a matrix in the mathematical sense – just an array. It doesn’t act on any sort of vectors.

Here’s my proposal for you: Take the team with the highest score overall and give them the trophy for their highest-scoring category. Remove them from consideration and repeat. Open contingencies for you to deal with: two teams having the same top score, and one team having more than one top-scoring category.

One idea: you give each cell a priority of the score for that team in that category. You make a priority queue from highest score to lowest. As you process each cell, you check if a trophy has been awarded for that category or to that team. If either is true, you discard the cell. If neither are true, you give the trophy for that category to that team, and mark each off. You halt when the number of trophies you’ve awarded are equal to the total number of trophies, or when you’ve awarded a number of trophies equal to the total number of teams, whichever is the lower number.

You will eventually give a trophy to each team for the highest score they received in a category where no one else got a higher score who did not already receive a trophy for an even higher score.

Example:



Category:  C1 C2 C3 C4
Team:
T1         1  2  3  5
T2         2  3  1  4
T3         3  4  2  1

3 teams, 4 categories: you will halt when you award 3 trophies

Queue (Category, Team, Score):
(4,1,5) (2,3,4) (4,2,4) (1,3,3) (2,2,3) (3,1,3) (1,2,2) (2,1,2) (3,3,2) (1,1,1) (3,2,1) (4,3,1)

First, team 1 gets a trophy for category 4.
Next, team 3 gets a trophy for category 2
Next, team 2 does not get a trophy for category 4 because team 1 already got it
Next, team 3 does not get a trophy for category 1 because they already got a trophy
Next, team 2 does not get a trophy for category 2 because team 3 already got it
Next, team 1 does not get a trophy for category 3 because they already got a trophy
Next, team 2 gets a trophy for category 1
You’ve awarded 3 trophies, so you’re done.

You don’t actually have to sort all the cells, you can just use a heap data structure. If you’re writing the program in C++, you can include the “queue” header file and use the push_heap and pop_heap functions. Note that dealing with two teams getting the same score or a team having the same score for two trophies will produce a certain amount of arbitrariness in the algorithm, depending upon exactly how the priority queue gets formed. But dealing with those cases is nontrivial regardless of the algorithm, I think.

What if two teams both have a five for the same category?

The order in which you assign the categories will greatly affect the assignments.

Unless there is a natural order to them, that is, each category is seen as more or less important than all others, you can expect some flack from the non-winners.

If the awards are more less equal and N=M, you might as well put all the teams in a hat and let the captain chose the award his team wants from those remaining in the order of the draw. This particular case seems to be negate the whole idea of awarding trophies for results.

It all depends upon what you decide will be your secondary priority. If you know a priori what that priority is, then you combine that with the score to better sort the cells. If you don’t know, then at every step you will pull out all the cells with identical priorities, and decide then which ones to accept and which ones to reject. You could also defer the decision, marking all trophies and teams with that score as “used”, but then refering back to that list whenever that trophy/team is hit again.

Example to follow sometime after lunch.

It’s not clear from your OP exactly what you’re trying to maximize. Since you’re talking about maximizing the “norm” of some diagonal vector it sounds like you’re trying to maximize the sum of the chosen “scores” (or their squares, or something). This is a well-known problem in optimization (called the “assignment problem”). It’s not really a matrix manipulation problem (since, as Mathochist says, it’s not a linear transformation you’re looking for), but there are efficient (polynomial-time) techniques for solving the problem. The algorithms proposed above (by Mathochist and Punoqllads) are greedy algorithms, which are fast but are not guaranteed to maximize the total score.

One polynomial-time algorithm which is guaranteed to find the optimum solution is called the Munkres algorithm (this link only explains the algorithm for a square matrix, but it can be generalized). Another algorithm (actually not polynomial-time in the worst-case, but often faster than Munkres) is the Auction algorithm (I didn’t find any good online explanations in a quick Google search).

No argument here. :stuck_out_tongue: But it’s not my system, and it’s set up for little kids. I was merely asked to come up with an objective way of dealing with this train wreck.

And there is a logical hierarchy of the categories. Some are better than others. So does that mean I use Punoqllads’s method, but with the most important category as C1 and the least as CM?

This sounds like it could map onto an instance of the stable marriage problem with the categories as the women and the teams as the men. The algorithm to solve that is dirt simple and should be easy to find on Google.

I don’t think it’s quite a stable marriage problem; there’s only one set of rankings (not one of teams for categories and another of categories for teams).

Not explicitly, but if all trophies are equally valuable, you can just assume that teams have no preference between any pair of trophies. Or, if trophies are not equally valuable, just make a team’s preference for the value of a trophy equal to the value of that trophy.

It’s not necessarily the best method to solve this problem, but it is an option.

Actually, now that I think about it, the stable marriage problem is not a good model for this. If either the men or the women all have the same preferences for the other set, making pairs at random is guaranteed to lead to a stable situation, and you probably want something here with a little more insight.

I guess I was trying to say that it seemed like an easier problem than stable marriage. The optimum-stable-marriage algorithms I see also seem to optimize the sums of the rankings (numbered 1-M); if aerodave’s “scores” are more general (as in Punoqllads’s example) it starts sounding more like assignment to me.

I’m trying to figure out what the stable marriage thing is all about…but in the meantime I’ll add this:

The inputs to the “matrix” aren’t originally scores, but rankings in each category. I had assumed that it would be easier to translate rankings 1 thru M into numerical “points.”

In reality, the teams are ranked 1 thru M in each category, and the categories can be assigned varying levels of importance. These could be sorted such that winning Trophy 2 is more deirable than winning Trophy 5. If a team was the clear winner in both categories (ignoring other contingencies) the team would be awarded Trophy 2, and someone else would get Trophy 5.

The reason this is so complex is there there will probably be a dozen or more categories, and 20 or so teams. It could get messy to figure it out all by hand in a short time.

Stable Marriage Problem

As I stated above, unless there’s reason to believe that two teams might have different preferences for which trophy they get, it’s not really a very good model.

Oh, if there’s a distinct hierarchy of the awards, then you want to traverse the cells from the best category to the least category. At each step, you give the trophy to the best team that has not yet gotten a trophy, and go on to the next trophy. For my example above, assuming C1 > C2 > C3 > C4, doing so would give team 3 trophy #1, team 2 trophy #2, and team 1 trophy #3. Even though team 1 did better in category 4 than category 3, you give them the category 3 trophy since it’s more desirable.

If you have a set of categories that you deem equally desirable, then when you’ve dealt with all of the categories more desirable than them, you can make a smaller array of all the remaining teams and just those equally-desirable categories, and do a priority queue as I mentioned above just for that set. If C1 is the most desirable category, but C2, C3, and C4 are all equally desirable, you would first give trophy #1 to team 3, then give trophy #4 to team 1, and finally trophy #2 to team 2.

If you have ties, you ultimately need some kind of tiebreaker value to immediately decide the result, or you need to pick randomly. Otherwise, you can get into pathalogical cases that make the BCS bowl system look like tiddlywinks.

Where was “optimum” clearly defined enough to use this term here?

It’s well-defined in terms of the assignment problem, which is quite clearly what Omphaloskeptic was discussing.

But how do we know that this is truly what the OP considers “optimal”?

Omphaloskeptic’s algorithm solves the OP’s conjectural model, but there’s as yet no justification that that is at all an appropriate model. In fact, I’d take any mathematical model from the OP with a grain of salt, since the reference to “eigen-somethings” sounds like a shaky grasp of the ideas involved.

As software specifications go, the OP is way more technical than what I usually work with. The question of whether a piece of software is really what the specifications ask for is the fundamental issue of software design (and make no mistake, that’s what we’re doing here). Ultimately, the best solution that anyone’s come up with is to bring the software to the requestor and ask them what they think of it.