Forming groups while avoiding some pairings - is this combinatorics? No idea how to calculate...

I have a set of 120 items from which I need to form 30 groups of 4 items.

But there are items in the 120 that should not wind up in the same group. I don’t know how to calculate how trivial or impossible it is to satisfy that condition.

How many random rules of the form “Item #X cannot be grouped with Item #Y” can I tolerate before it starts to become difficult to do the grouping?

I realize there are pathological rule combinations, like if one item is unpairable with every single other item. My interest is more in having some idea if having 10 random such rules gets iffy, or is it 100 or 1000 or a million or a billion rules or what? Not sure where to even start to get some sense of this.

Any advice would be greatly appreciated, thanks!!

So close and yet so far …

So, n items, want to form k groups of n/k items to that certain pairs aren’t in the same group.

If you drop the “n/k” part, this is graph coloring. Construct a graph. The nodes are the items. The pairs that shouldn’t be in the same group are connected by edges.

Just setting k to as small as 3 makes this NP-Complete.

But you seem to have a large k (which is very good, makes things easier) but OTOH you want even sized groups (which makes things harder), and another but is that the group sizes are small (I think that’s bad, maybe).

I suspect by “padding” the graph, adding one big clique, a lot of links from most nodes in the clique to the regular nodes, etc., you can convert your problem into graph coloring.

If this is the case, you’re stuck-ish with a problem that is NP-Complete in general. While that means you aren’t going to come up with a general efficient solution, there are “outs” of various sorts.

Approximation algorithms for various graph problems exist. (See first link.) These might work well given the large k and hence small n/k.

But just brute force might be the best use of your time.

If the number of forbidden-pairings is relatively small you should be able to do this quite easily:

Start forming groups of 4 wherever possible including individuals that have one or more forbidden others (obviously excluded from the group).

When you get down to the last few, you may find the conditions difficult to satisfy, but selectively swapping a “problematic” individual with another individual in a previously assigned group, you should get there fairly quickly.

OTOH, if there are relatively many forbidden pairings, this could indeed be difficult (if not impossible).

Can you give us more detail? Can you estimate the number of forbidden-pairings?

Isn’t this sort of what happens with license plates? I.e., they want to run through as many combinations as possible, while excluding FUK and the like.

I would bet that this is a relatively sound and efficient heuristic even if the set is forbidden-pair heavy:

Choose an individual with at least one forbidden-other. Place that pair side-by-side. Find a forbidden-other for the *second *individual. Place it so you now have a line of three. Continue like that until you have 30 items in a line. If at any moment the last individual has no forbidden-others remaining, choose another individual with at least one forbidden-other (if there are no such individuals you can place what’s left in any order you like).

Start a new line under this old one. Follow the same procedure as the first line with the added constraint that individuals cannot be forbidden-others of an individual in the line above.

Continue with the third and fourth lines similarly.

If this get’s difficult (i.e. blocked) before you’ve started line 4, my guess would be you will have a long slog to find a good partitioning. If you get blocked half-way through line 4, my guess would be that swapping remaining individuals with previously placed individuals will take you to a solution.

WAG, the above algorithm would probably tolerate a few hundred fairly distributed forbidden-pairings.

I wrote a quick simulation to get a feel for the problem.

Given 120 distinct items to be put evenly into 30 bins with p forbidden pairings, what is the probability that randomly placing items in bins will avoid all forbidden pairings. Here are the success rates for 0 <= p <= 24 at 10,000 trials for each value of p.


p success rate
0 1.0
1 0.9753
2 0.9489
3 0.9298
4 0.9028
5 0.8782
6 0.8544
7 0.8354
8 0.8215
9 0.7855
10 0.7698
11 0.761
12 0.7379
13 0.7164
14 0.6909
15 0.6764
16 0.6603
17 0.6447
18 0.6265
19 0.617
20 0.6094
21 0.5836
22 0.5705
23 0.56
24 0.5392

I know this isn’t exactly what you’re looking for, but it may provide some insight.

Let me know if you’d like me to extend the table or provide the source code (Python).

I’ve let run for a while and we are at around 1 success in 10,000 trials when the number of pairs is around 400. Is that hard?

You can have as many 6,960 = 120 * 116 / 2 pairs and still succeed (or as few as 120 and fail as stated in the OP).

Oops…

…as few as 117 and fail.

Of course, that’s just for random groupings. It’s probably a lot easier if you apply a simple algorithm like “if there are any forbidden pairings, swap someone in a forbidden pairing with a random person from another group”. And of course, even that will still leave some false negatives.

If we’re forced to use a random arrangement, it is easy to calculate the chance of success exactly. There are C(120,2) = 7140 pairings of which 60*116 = 6960 are problem-free. The chance that k random pairings are all trouble-free is C(6960, k) / C(7140, k).

For k = 1 this is .9748; k = 24 is .5413; k = 400 is .0000272.

The numbers you report for k=1 and k=24 are close to the above (C(6960, k) / C(7140, k)) but your estimate for k = “around” 400 is high.

Just to clarify (and as Chronos points out), this is all based on a random rearrangement with no attempt to dodge forbidden pairings. If you are allowed to rearrange appropriately, success rate when k=400 increases from 0.0027% to 99.9% or more.

I wonder if this is one of those problems where there is a fairly crisp threshold T such that when, say, k < T-10, rearrangement is almost always possible, but when k > T+10, successful rearrangement is very seldom possible. I’m pretty sure T is larger than 600, but I’ve no idea how much larger. T is the number OP asks for.

A good estimate of T will be useful if/when we organize an SDMB contest to build the fastest rearranger for Napier’s problem! :slight_smile:

Wouldn’t it be easier to just calculate both groups first and then do:

Acceptable groups = all possible groups - forbidden groups

It’s been 40 year since I took combinatorics, but…

Yes. That is the standard method.

Number of arrangements of k of n items, is n!/(n-k)!
If order is not important, since you said “groups” to me implies no order - n!/((n-k)!k!)

Now if a and b cannot be together, subtract all groups where a and b are both part of the group. Instead of picking k items, you are picking (k-2) since the membership of a, b are fixed:

(k-2)!/((n-k-2)!(k-2!))
Now, if you have two such “forbidden groupings” (ab, cd) then you also have to eliminate the case where all 4 appear, so subtract the k-4 case.

If there’s overlap, it’s more interesting (i.e. ab, and bc). But then, this is manual because there’s no simple general rule for the exclusions. For massive numbers of pairs and weird overlaps, the manual process would be unwieldy.
An interesting and deeper case would be, for example, the pick-6-of-49 lottery, but “no two numbers could end in the same digit.”

The you can say “I pick one of 49 number at random.” Let’s say, 36.
The next number, I exclude 4 other numbers ending in that number (6, 16, 26, 46) - so I can only pick one of 44, not 48. Let’s say I pick 22 - that excludes also 2,12,32,42.
Now I can only pick 1 of 39 - numbers that don’t end in 2 or 6.

And so on… the total possible pick are 49x44x39x34x29x24 instead of 49!/(49-6)!
If order of pick does not matter, divide by 6!

So you can see, how you define exclusion pairs goes a long way toward making the math simpler.

I should have said “less than” instead of “around”. In the neighborhood of k = 400 (I used p instead of k in my post for anyone trying to link this together) it was much more common to have zero successes in 10,000 trials than one success.

The T sought by OP may be about 4900. Using the most obvious hill-climbing for rearrangement I observed the following success rates:
k = 4700 –> 98%
k = 4800 –> 92%
k = 4900 –> 76%
k = 5000 –> 41%
k = 5100 –> 21%
k = 5200 –> 2%

I allowed 1 million probes (occasionally almost that many were used!). I wrote the simulator to allow a crude form of annealing but found that a temperature of absolute zero worked best.
It’s quite likely there are better algorithms with a higher success rate.

I worked thru the Graph Isomorphism reduction details and yeah. This is solidly in the bad range of NP-Complete.

No real hope of using Dynamic Programming, AuxPDAs, etc.

Hill climbing is an interesting suggestion.

There’s also simulated annealing. Think of starting this way: just randomly assign items into groups of the right size. (Pick an item, pick an unfilled group, etc.)

Go thru the not allowed pairs in the same group. Pick a modest percentage of those. Randomly switch around with items in other groups. Lather, rinse, repeat. A heuristic would help.

As for the OP’s question about when does the going get rough: For the above, it would be when the percentage to swap matches on average the percentage of new not allowed pairs in groups created. I.e., the process stops going forward.

septimus, do I understand you correctly, that you just wrote a simulated annealing many-dimensional optimization program, just because some random guy on the Internet was idly curious about something? Color me impressed.

I suspect that this is because of the discrete nature of the problem: The number of collisions is always an integer.

The “optimizer” mentioned before selected two random groups and looked for the best of 16 possible swaps, accepting it as long as the number of forbidden links didn’t increase. (Simulated annealing would mean accepting some negative-valued swaps.) A million such swaps were investigated before giving up.

The runs were time-consuming so I only checked 100 or 200 random forbiddings sets. Thus the numbers posted above had largish margin of error.

This last statement was an understatement. I modifed the software so that a small percentage of the time it selects three random groups and takes the best of the 64 possible 3-way swaps. With this change the performance at k = 5100 improved from 16% to about 95% ! But it’s still likely that further large improvements are possible.

I find it fun, like doing a crossword puzzle or composing a political diatribe for SDMB. :stuck_out_tongue: I’ve much practice at this sort of thing and crank out the necessary code with better speed and accuracy than when writing SDMB messages! I spent much more time steering the runs and tuning parameters than writing the 100+ lines of code.

Considering how much prior practice I’ve had, I should be embarrassed. Replacing 2-swap optimizer with a 3-swap optimizer is something I’ve often done before. If less lazy or less short-sighted I’d have coded that way from the beginning.

I think your comment about annealing is not quite correct; I know how to make the annealing more proper and more likely to work, but was too lazy to bother for a non-paying client. :stuck_out_tongue: For discrete optimizations of this type variations of Lin-Kernighan algorithms are popular and usually better than annealing — that I didn’t use them here is due, again, to laziness and incompetence.

Sorry, I’m having trouble following what’s being calculated here. What is meant by a “problem-free pairing”?

My only further contribution is to note, along such lines as ftg touched upon before, that this is the problem of equitably 30-coloring a 120-vertex graph, and thus the results and conjectures found when Googling “equitable coloring” may be of use to you.