Forming groups while avoiding some pairings - is this combinatorics? No idea how to calculate...

Napier · August 23, 2016, 8:24pm

I have a set of 120 items from which I need to form 30 groups of 4 items.

But there are items in the 120 that should not wind up in the same group. I don’t know how to calculate how trivial or impossible it is to satisfy that condition.

How many random rules of the form “Item #X cannot be grouped with Item #Y” can I tolerate before it starts to become difficult to do the grouping?

I realize there are pathological rule combinations, like if one item is unpairable with every single other item. My interest is more in having some idea if having 10 random such rules gets iffy, or is it 100 or 1000 or a million or a billion rules or what? Not sure where to even start to get some sense of this.

Any advice would be greatly appreciated, thanks!!

ftg · August 23, 2016, 9:20pm

So close and yet so far …

So, n items, want to form k groups of n/k items to that certain pairs aren’t in the same group.

If you drop the “n/k” part, this is graph coloring. Construct a graph. The nodes are the items. The pairs that shouldn’t be in the same group are connected by edges.

Just setting k to as small as 3 makes this NP-Complete.

But you seem to have a large k (which is very good, makes things easier) but OTOH you want even sized groups (which makes things harder), and another but is that the group sizes are small (I think that’s bad, maybe).

I suspect by “padding” the graph, adding one big clique, a lot of links from most nodes in the clique to the regular nodes, etc., you can convert your problem into graph coloring.

If this is the case, you’re stuck-ish with a problem that is NP-Complete in general. While that means you aren’t going to come up with a general efficient solution, there are “outs” of various sorts.

Approximation algorithms for various graph problems exist. (See first link.) These might work well given the large k and hence small n/k.

But just brute force might be the best use of your time.

The_Great_Unwashed · August 23, 2016, 9:31pm

If the number of forbidden-pairings is relatively small you should be able to do this quite easily:

Start forming groups of 4 wherever possible including individuals that have one or more forbidden others (obviously excluded from the group).

When you get down to the last few, you may find the conditions difficult to satisfy, but selectively swapping a “problematic” individual with another individual in a previously assigned group, you should get there fairly quickly.

OTOH, if there are relatively many forbidden pairings, this could indeed be difficult (if not impossible).

Can you give us more detail? Can you estimate the number of forbidden-pairings?

Tom_Tildrum · August 23, 2016, 9:49pm

Isn’t this sort of what happens with license plates? I.e., they want to run through as many combinations as possible, while excluding FUK and the like.

The_Great_Unwashed · August 23, 2016, 9:50pm

I would bet that this is a relatively sound and efficient heuristic even if the set is forbidden-pair heavy:

Choose an individual with at least one forbidden-other. Place that pair side-by-side. Find a forbidden-other for the *second *individual. Place it so you now have a line of three. Continue like that until you have 30 items in a line. If at any moment the last individual has no forbidden-others remaining, choose another individual with at least one forbidden-other (if there are no such individuals you can place what’s left in any order you like).

Start a new line under this old one. Follow the same procedure as the first line with the added constraint that individuals cannot be forbidden-others of an individual in the line above.

Continue with the third and fourth lines similarly.

If this get’s difficult (i.e. blocked) before you’ve started line 4, my guess would be you will have a long slog to find a good partitioning. If you get blocked half-way through line 4, my guess would be that swapping remaining individuals with previously placed individuals will take you to a solution.

The_Great_Unwashed · August 23, 2016, 9:53pm

WAG, the above algorithm would probably tolerate a few hundred fairly distributed forbidden-pairings.

Lance_Turbo · August 24, 2016, 12:57am

I wrote a quick simulation to get a feel for the problem.

Given 120 distinct items to be put evenly into 30 bins with p forbidden pairings, what is the probability that randomly placing items in bins will avoid all forbidden pairings. Here are the success rates for 0 <= p <= 24 at 10,000 trials for each value of p.


p success rate
0 1.0
1 0.9753
2 0.9489
3 0.9298
4 0.9028
5 0.8782
6 0.8544
7 0.8354
8 0.8215
9 0.7855
10 0.7698
11 0.761
12 0.7379
13 0.7164
14 0.6909
15 0.6764
16 0.6603
17 0.6447
18 0.6265
19 0.617
20 0.6094
21 0.5836
22 0.5705
23 0.56
24 0.5392

I know this isn’t exactly what you’re looking for, but it may provide some insight.

Let me know if you’d like me to extend the table or provide the source code (Python).

Lance_Turbo · August 24, 2016, 2:17am

I’ve let run for a while and we are at around 1 success in 10,000 trials when the number of pairs is around 400. Is that hard?

You can have as many 6,960 = 120 * 116 / 2 pairs and still succeed (or as few as 120 and fail as stated in the OP).

Lance_Turbo · August 24, 2016, 2:23am

Oops…

…as few as 117 and fail.

Chronos · August 24, 2016, 2:33am

Of course, that’s just for random groupings. It’s probably a lot easier if you apply a simple algorithm like “if there are any forbidden pairings, swap someone in a forbidden pairing with a random person from another group”. And of course, even that will still leave some false negatives.

septimus · August 24, 2016, 11:26am

If we’re forced to use a random arrangement, it is easy to calculate the chance of success exactly. There are C(120,2) = 7140 pairings of which 60*116 = 6960 are problem-free. The chance that k random pairings are all trouble-free is C(6960, k) / C(7140, k).

For k = 1 this is .9748; k = 24 is .5413; k = 400 is .0000272.

The numbers you report for k=1 and k=24 are close to the above (C(6960, k) / C(7140, k)) but your estimate for k = “around” 400 is high.

Just to clarify (and as Chronos points out), this is all based on a random rearrangement with no attempt to dodge forbidden pairings. If you are allowed to rearrange appropriately, success rate when k=400 increases from 0.0027% to 99.9% or more.

I wonder if this is one of those problems where there is a fairly crisp threshold T such that when, say, k < T-10, rearrangement is almost always possible, but when k > T+10, successful rearrangement is very seldom possible. I’m pretty sure T is larger than 600, but I’ve no idea how much larger. T is the number OP asks for.

A good estimate of T will be useful if/when we organize an SDMB contest to build the fastest rearranger for Napier’s problem!

Nava · August 24, 2016, 11:53am

Wouldn’t it be easier to just calculate both groups first and then do:

Acceptable groups = all possible groups - forbidden groups

md2000 · August 24, 2016, 12:48pm

It’s been 40 year since I took combinatorics, but…

Yes. That is the standard method.

Number of arrangements of k of n items, is n!/(n-k)!
If order is not important, since you said “groups” to me implies no order - n!/((n-k)!k!)

Now if a and b cannot be together, subtract all groups where a and b are both part of the group. Instead of picking k items, you are picking (k-2) since the membership of a, b are fixed:

(k-2)!/((n-k-2)!(k-2!))
Now, if you have two such “forbidden groupings” (ab, cd) then you also have to eliminate the case where all 4 appear, so subtract the k-4 case.

If there’s overlap, it’s more interesting (i.e. ab, and bc). But then, this is manual because there’s no simple general rule for the exclusions. For massive numbers of pairs and weird overlaps, the manual process would be unwieldy.
An interesting and deeper case would be, for example, the pick-6-of-49 lottery, but “no two numbers could end in the same digit.”

The you can say “I pick one of 49 number at random.” Let’s say, 36.
The next number, I exclude 4 other numbers ending in that number (6, 16, 26, 46) - so I can only pick one of 44, not 48. Let’s say I pick 22 - that excludes also 2,12,32,42.
Now I can only pick 1 of 39 - numbers that don’t end in 2 or 6.

And so on… the total possible pick are 49x44x39x34x29x24 instead of 49!/(49-6)!
If order of pick does not matter, divide by 6!

So you can see, how you define exclusion pairs goes a long way toward making the math simpler.

Lance_Turbo · August 24, 2016, 5:04pm

I should have said “less than” instead of “around”. In the neighborhood of k = 400 (I used p instead of k in my post for anyone trying to link this together) it was much more common to have zero successes in 10,000 trials than one success.

septimus · August 24, 2016, 6:18pm

The T sought by OP may be about 4900. Using the most obvious hill-climbing for rearrangement I observed the following success rates:
k = 4700 –> 98%
k = 4800 –> 92%
k = 4900 –> 76%
k = 5000 –> 41%
k = 5100 –> 21%
k = 5200 –> 2%

I allowed 1 million probes (occasionally almost that many were used!). I wrote the simulator to allow a crude form of annealing but found that a temperature of absolute zero worked best.
It’s quite likely there are better algorithms with a higher success rate.

ftg · August 24, 2016, 9:01pm

I worked thru the Graph Isomorphism reduction details and yeah. This is solidly in the bad range of NP-Complete.

No real hope of using Dynamic Programming, AuxPDAs, etc.

Hill climbing is an interesting suggestion.

There’s also simulated annealing. Think of starting this way: just randomly assign items into groups of the right size. (Pick an item, pick an unfilled group, etc.)

Go thru the not allowed pairs in the same group. Pick a modest percentage of those. Randomly switch around with items in other groups. Lather, rinse, repeat. A heuristic would help.

As for the OP’s question about when does the going get rough: For the above, it would be when the percentage to swap matches on average the percentage of new not allowed pairs in groups created. I.e., the process stops going forward.

Chronos · August 24, 2016, 10:25pm

septimus, do I understand you correctly, that you just wrote a simulated annealing many-dimensional optimization program, just because some random guy on the Internet was idly curious about something? Color me impressed.

I suspect that this is because of the discrete nature of the problem: The number of collisions is always an integer.

septimus · August 25, 2016, 3:08am

The “optimizer” mentioned before selected two random groups and looked for the best of 16 possible swaps, accepting it as long as the number of forbidden links didn’t increase. (Simulated annealing would mean accepting some negative-valued swaps.) A million such swaps were investigated before giving up.

The runs were time-consuming so I only checked 100 or 200 random forbiddings sets. Thus the numbers posted above had largish margin of error.

This last statement was an understatement. I modifed the software so that a small percentage of the time it selects three random groups and takes the best of the 64 possible 3-way swaps. With this change the performance at k = 5100 improved from 16% to about 95% ! But it’s still likely that further large improvements are possible.

I find it fun, like doing a crossword puzzle or composing a political diatribe for SDMB. I’ve much practice at this sort of thing and crank out the necessary code with better speed and accuracy than when writing SDMB messages! I spent much more time steering the runs and tuning parameters than writing the 100+ lines of code.

Considering how much prior practice I’ve had, I should be embarrassed. Replacing 2-swap optimizer with a 3-swap optimizer is something I’ve often done before. If less lazy or less short-sighted I’d have coded that way from the beginning.

I think your comment about annealing is not quite correct; I know how to make the annealing more proper and more likely to work, but was too lazy to bother for a non-paying client. For discrete optimizations of this type variations of Lin-Kernighan algorithms are popular and usually better than annealing — that I didn’t use them here is due, again, to laziness and incompetence.

Indistinguishable · August 25, 2016, 6:01am

Sorry, I’m having trouble following what’s being calculated here. What is meant by a “problem-free pairing”?

Indistinguishable · August 25, 2016, 6:55am

My only further contribution is to note, along such lines as ftg touched upon before, that this is the problem of equitably 30-coloring a 120-vertex graph, and thus the results and conjectures found when Googling “equitable coloring” may be of use to you.

Topic		Replies	Views
Algorithm to create groups with minimal internal duplicates Factual Questions	33	2527	September 27, 2011
A Computer Science /Math Algorithm Problem? Factual Questions	22	2826	June 6, 2011
Mathematical/logical problems Factual Questions	11	1268	June 16, 2006
Determine all ways to group 49 into groups of 7 such that Factual Questions	14	656	October 21, 2023
Does this computing problem have a name/soluiton/known algorithm? Factual Questions	23	1558	January 6, 2005

Forming groups while avoiding some pairings - is this combinatorics? No idea how to calculate...

Related topics