Does this computing problem have a name/soluiton/known algorithm?

I have a coding issue to solve at the moment, and attempting to research ways people might have solved similar problem in the past.

This is running into the problem that I don’t really know what to call my issue, or what branch of theory might be the right one to look at for solutions!

This is the (bare bones) problem.

I have two sets of items - say A B C D… in the first set, 1 2 3 4…

An arbitrary number of items in the first set may match to an arbitrary number of items in the second.

e.g.
A => 1,2
B => 1,2,3
C => 3
D => 3,4

I need to organise these items into a minimal set of “match specifications”. The rules for these are that every item on the left matches everything on the right. So, the above four lines are in fact a valid set of “match specifications” for this particular instance … but not the most efficient one. That would be …

A,B =>1,2
B,C,D =>3
D=>4

or

A,B => 1,2
B,C =>3
D => 3,4

…each of which takes only 3 lines rather than 4

Clear as mud?

Anyway … my first questions is: what wort of problem is this? Set Theory? Topology? (I’m grasping at straws here…)

Is it a well-known problem that has a name, like the Travelling Salesman problem or whatever?

Where should I look for help in getting an efficient solution?

Sounds like some combination of set theory and combinatorial theory (branch/precursor of statistics). It doesn’t remind me of any other well settled problem but take that FWIW. I am also trying to figure out why you would be doing this. Might also be able to solve it using some sort of logic reduction method (similar to reduction using Karnaugh maps) but that’s just a WAG based on intuition, hardly a developed solution.

Somebody’s probably solved something similar but probably not common enough to find it published.

I’m not sure what you mean by “match”.

Step one is to have a well-defined problem. What does A,B =>1,2 mean? For that matter, what does A=> 1, 2 mean? I can come up with multiple interpretations of both of these. Perhaps you can explain what is your intended application? The way you described it is definitely not the traveling salesman problem. If you can state the problem clearly, there may very well already be a known algorithm for solving it.

I haven’t seen an algorithm for this problem, but I may be able to narrow your search. Your problem is isomorphic to the following problem in graph theory:

What is the minimal decomposition of a bipartite graph into complete bipartite graphs?

See Mathworld for definitions of these terms.

To explain, your first set (A, B, C, D, … ) is isomorphic to one side of the bipartite graph, the other set (1, 2, 3, 4) is isomorphic to the other side, and your rules, (e.g. A,B -> 1,2,3) are isomorphic to a complete bipartite graph.

As to the algorithm, I know that merely finding the largest complete graph that is a subgraph of an arbitrary graph is NP Hard (in other words, intractable). I don’t know if limiting it to bipartite graphs helps, but it might. ccwaterback and nivlac, this should help to clarify.

This is a good place to start. First see if you can find an algorithm for finding the largest complete bipartite subgraph of a bipartite graph. This, in itself, is a difficult problem, and your problem is harder than this.

My intuition is that it is intractable, because:
Suppose we have bipartite graph (A,B,C,…,N;1,2,3,…,X)you start with complete bipartite graph A->1,2. Now you want to know whether to enlarge this to complete bipartite graph A,B->1,2 or A->1,2,3 (suppose Bx>3). In order to know which choice is optimal, you must check A,C?>1,2,3, A,D?>1,2,3, … A,N?>1,2,3. against A,B?>1,2,4, A,B?>1,2,5 … A,B?>1,2,X if you get my meaning. This isn’t a proof, just an intuitive argument that this is a nasty problem.

Good luck.

At first I didn’t understand the problem, but I think it can be phrased like so:

There’s some mapping of elements from set A to set B. The mapping is arbitrary, in that it is only defined by an examination of the relation of the elements in the sets. Give an algorithm to ascertain the least number of “rules” necessary to express the given mapping.

There’s a problem with the formulation of the question. I think what he’s looking for is: a valid “rule” is defined as each element from set A maps to the same elements from set B. In other words, they have a common mapping. That is, if:

A => 1
B => 1,2
C => 2

then the rules:

A,B => 1
B,C => 2

are valid. However, if:

A => 1
B => 1,2
C => 3

then at least three rules are needed.

A brute force algorithm that I think will work is to take the element from set A with the greatest out-degree, then see if any other elements from set A have common mappings. If there are multiple mappings that are not the same, the rule that is being constructed needs to be split. (I think you’d want to take a mapping that is a subset if possible, being greedy when possible.) Given your example:

A => 1,2
B => 1,2,3
C => 3
D => 3,4

Take:
Rule 1: B => 1,2,3

Both A and C have common mappings. Add A, since it has a larger out-degree (we’re being greedy) and split the rule:
Rule 1: A,B => 1,2
Rule 2: B => 3

Now you’re left with C and D. Being greedy, take D. It can be partially added to Rule 2, but must have an additional rule to account for the mapping to 4:
Rule 1: A,B => 1,2
Rule 2: B,D => 3
Rule 3: D => 4

Now you’re left with C, which coincides with Rule 2, so just add it:
Rule 1: A,B => 1,2
Rule 2: B,C,D => 3
Rule 3: D => 4

Naturally, this would be uglier if the mappings of A weren’t a subset of those of B. I seem to have misplaced my algorithms book, otherwise I’d look for an actual known problem. This seems like an (iterated) covering set problem to me, but I may misunderstand what you’re asking.

Y’know, on further reflection, perhaps a better way to visualize this is to try something like a Karnaugh map (as CookingWithGas suggested). Make a matrix of the elements (how does one get fixed width fonts in a post?) like so:

+|1-2-3-4
A|x-x----
B|x-x-x–
C|----x–
D|----x-x

Take out the largest group of x’s that coincide. As in a Karnaugh map, “blocks” of commonality – that is, squares or rectangles of x’s – are best. In this case, it’s A,B and 1,2. (Of course, finding it algorithmically is the problem here. I don’t know how to do it.) The next largest group is B,C,D and 3. Then you’re just left with D, giving:

A,B => 1,2
B,C,D => 3
D => 4

On preview, I note that jawdirk mentioned that he thinks it’s a bipartite graph problem. Either way, it’s nasty. And, if either one of us is correct, it’s NP-hard or NP-complete. But I hope that helped some…

Sorry, I felt the need to post a slight clarification:

I should have been more explicit. “Being greedy” means taking the mapping with the largest number of common elements. So this:

isn’t quite correct, as both C and D have only one element in common with Rule 2. Either one would be acceptable.

Still scratching my head here, but you faster folks can continue on and leave me behind.

From what I see, your “relationship” is NOT a function according to Mathworld.

http://mathworld.wolfram.com/Function.html

Another correction:

I (now) don’t think it’s a covering set problem; it was late and I was kinda tired. More like an iterated largest (or greatest) common subset problem. The problem here is that, if I search on that, I don’t really see anything that matches my understanding of the problem.

Most definitely not a function. However, the term mapping is appropriate, which puts it in the realm of set theory.

Aspidistra – you still there? Did anyone answer (or at least help with) your question?

Well, no. But that’s of little concern.

I know I’ve looked at a problem very similar to this one, and possibly exactly the same, and proved it NP-complete. I think I used a reduction to vertex-cover.

Sorry, make that a reduction from vertex-cover.

Is this equivalent to matrix rank? For example, the OP’s initial case could be represented by a matrix filled with 1s and 0s:



|A|   |1 1 0 0 |   |1|
|B| = |1 1 1 0 | * |2|
|C|   |0 0 1 0 |   |3|
|D|   |0 0 1 1 |   |4|


The matrix happens to have rank 3, and that seems to be the answer. Maybe the matrix rank isn’t always the answer, but it seems like it should at least give an upper bound, which might be useful.

No. Knowing an upper bound for the size of the minimal match would be very useful in a brute force search, but ultimately we’re interested in what the minimal match is, not just how large it is.

No. Knowing an upper bound for the size of the minimal match would be very useful in a brute force search, but ultimately we’re interested in what the minimal match is, not just how large it is.

I’d say it’s combinatorics, though there may be a geometrical solution. Try representing your correlations as a Boolean matrix (all zeroes and ones). For example, your sample becomes



1100
1110
0010
0011


where “1100” in the first line means that A corresponds to 1 and 2, but not 3 and 4. Now you want to find the least number of submatrices needed to cover exactly the 1s.

That takes a graph-theoretic problem and transforms it into the same graph-theoretic problem with a different representation of the graph. I’m not sure it’s really helpful, especially to someone whose primary concern is implementation.

I’m pretty well convinced that this problem is NP-complete based on a reduction from vertex cover. I need to look up a few details, and I got meetings and all that crap, so I’ll try to post a proof later. I’ll also think about approximation algorithms.

If I understand you correctly, this looks like a logic minimization problem, first solved by Quine and McCluskey in the early 1950s. Consider the matrix as a Karnauh map, and find the best cover. There are lots and lots of algorithms and tools for this.

I’m not sure I understand “match” totally, even when rereading the posts, but it looks a bit like a compaction problem. In test compaction you try to merge a number of test vectors into the minimal set. Test compaction also deals with don’t cares. Microcode compaction attempts to merge a number of microoperations into the smallest set, but this is a bit more complicated since it covers dependencies between operations.

I agree that it looks like a covering problem (the problems above are also) and is NP complete.

Do you really, really need an optimal solution? Research I did years back shows that greedy list scheduling types of algorithms usually give very good, but not optimal, results, and do it in linear time.

I dunno, I don’t think it’s a cover problem. The OP isn’t talking about discarding elements of the domain or codomain, but rather consolidating rules to take up fewer lines of code. It’s not entirely well-defined at this point.

Solving for an “optimal” solution is NP complete, I believe. But finding a pretty-good answer you can probably do in polynomial time. I think I came up with a decent solution (i.e., reinvented the wheel) for something like this, though I’ll have to go through my notes to find what I came up with. The biggest issue was dealing with the fact that spatial locality is not relevant. Specifically, suppose we switch sets A and D with each other. We’d get something like:



  1 2 3 4
D 0 0 1 1
B 1 1 1 0
C 0 0 1 0
A 1 1 0 0


I’ll try to post something later today.

Okay, I think this was the algorithm:

  1. Pick a column
  2. Pick all the rows that have that column set.
  3. Do an AND operation on each of those rows’ bit vector.
  4. Enter the set of all of those rows by all of the columns still set.
  5. Delete those columns.
    5a) (Optional) Delete those rows that are now all 0
  6. Repeat until all columns have been processed.

So for this example, we first pick column 1.
Rows A and B have column 1 set.
ANDing all of their columns together leaves columns 1 and 2.
Add set (A, B) => (1, 2)
Delete columns 1 and 2.
Delete row A.

Pick column 3.
Rows B, C, and D have column 3 set.
ANDing all of their columns together leaves only column 3.
Add set (B, C, D) => (3)
Delete column 3.
Delete rows B and C.

Pick column 4.
Row D has column 4 set.
ANDing it is trivial, and only column 4 is set.
Add set (D) => (4)
Delete column 4.
Delete row D.

Done.

It’s of order O(R * C + C[sup]2[/sup]), so you can optimize each step by swapping rows and columns if the number of rows is less than the number of columns.