Wow, a most productive thread! Thank you! I think I am most encouraged by Lance Turbo’s simulation finding that 1/10,000 random groupings still work when there are 400 specified forbidden pairings.
I should have said that I already have an algorithm for creating the groups.
It should be no surprise that the real problem has more details and complications and this is somewhat of a simulation. I am actually trying to create these groups such that they each contain a nice variety of elements, the 120 given elements each having about 6 class variables and 2 continuous variables, the class variables having from 2 to 25 possible values.
My algorithm does an initial group formation based by sorting on the most important variables and then dealing elements into groups around the table like a card game (four card stud with thirty players), so each group has the same number of elements whose most important variable has this value, and the next most important variable has that value, and so forth. The groups thus have a better variety than they would have if created randomly.
Then my algorithm iteratively tries to improve the groups. It has a scoring method that judges the variety in a given group (actually the sum of all the differences between each possible pairings of elements in the group, normalized by how common or rare the differences are). So I score all the groups and then pick two, one of which has a fair or good score and the other of which has a really poor score, and then I try all the possible swappings of one element of each, and of two elements of each. Out of all these possible rearrangements I pick the one with the best combined variety score. I keep iterating this way. Gradually the average score over all groups improves, and the worst case scores improve considerably (because I weight “best combined variety score” to force this).
The whole question of forbidden pairings operates essentially independently of all of the above. But the way I implement it is that when I calculate a variety score for a group, I also check for any of the forbidden pairings, and apply a huge penalty for that one. Such groupings therefore die off as soon as they form, leaving safe groupings to thrive. I should call the “variety score” a “merit score” now, because it represents more than just variety.
My worry is that I am learning more about why we might want to forbid certain pairings, and I can imagine our number of forbidden pairs is going to go up. In past runs it has been less than 10 but I can imagine it becoming 100 or so, maybe more. I’m wondering if there is a catastrophy waiting in the wings. Will my problem become unsolvable soon? Or do I have to have 1000 forbidden pairs, or fantastically more than that, to hit the wall? I might adjust the threshold for how serious a forbidden pairing has to be before I consider it worth avoiding.
It is pretty difficult to do a simulation with my algorithm as it is, for reasons that get messy, but I see my worry as an essentially isolated question. Of course all of this would change quite a lot if the reasons pairings become forbidden has any relationship to the values of all the variables, but I think for physical reasons that it has no relationship or at most very weak relationships.
Thanks for all the thought!