Assume your domain is the integers from 1 through 45. Your task is to create groups of 5 numbers at a time, such that all groups of 3 (unordered) numbers are represented within some group of five.
One group of 5 numbers is chosen without replacement from the 45 possibles. The next group of 5 is also chosen without replacement from the full 45 numbers. So there are no duplicate numbers WITHIN one group of 5, but duplicates are allowed in different groups of 5.
For example, let’s take the group of 5 numbers: 1, 2, 3, 4, 5.
This group of 5 contains the following groups of 3 numbers:
1 2 3; 1 2 4; 1 2 5; 1 3 4; 1 3 5; 1 4 5
2 3 4; 2 3 5; 2 4 5; 3 4 5;
The number of groups of 3 within a group of 5 is “5 choose 3”, or (5!) / ((3!) * (2!)) = 10
So my questions are:
What is the minimum number of groups of 5 needed to encompass all possible groups of 3 numbers from the numbers 1 - 45? And how is this number calculated?
How do you then generate all these groups of 5 numbers? What algorithm do you use?
How do you generalize this result to, say, all groups of 2 or 4 numbers?
Thanks,
J.
p.s., no, this isn’t a homework problem. It’s just something I’m trying to figure out for myself. And unfortunately, my statistics knowledge has been lost in long term storage.
Start by generating the appropriate De Bruijn sequence. Construct groups of five by taking the first through fifth elements, the second through sixth elements, etc. until you’ve exhausted the sequence. I’m not claiming that this is the minimal set of groups of five elements, but it should work.
I don’t think I understand your setup. You say you choose the groups of 5 “without replacement” from the total 45, yet every possible group of 3 should be contained within one of these groups of 5? How could that possibly work? Surely, some of these groups of 5 will have to overlap, no?
ETA: Nevermind, I read more carefully. No duplicates within a group, but duplicates allowed “between” groups. Sorry, I should’ve seen it; you were quite explicit.
If you’re only trying to ensure that all possible pairs appear at least once in a subset, rather than all triplets, this called a block design in combinatorics. Specifically, you want a “balanced incomplete block design” of type (b, 45, r, 5, lambda). If you look at the equations near the top of the Wiki page, you’ll see that you end up with r = 11*lambda and b = 9r; this means that the smallest block design must have 99 subsets.
Further down in the wiki article, there’s a bit about the generalization to t-designs; your original problem is a 3-design with (v,k,lambda) = (45,5,lambda). If such a design exists for lambda = 1, and I’m interpreting those equations correctly, it has to have b[sub]0[/sub] = 1419 sets in it.
The actual construction of such a design is another question, though. I’ll ponder it a little more during the day today if I have time, but hopefully I’ve given you something to get the ball rolling if you want to research this further.
The block-design inequalities provide lower bounds, but these are not always achievable. It’s not too hard to show that they can’t be satisfied in this case: a 1419-set solution doesn’t exist. The bound of 1419 assumes that each subset of three elements occurs in exactly one 5-element subset (= “block”), so we just have to show that some 3-subsets must occur in more than one block.
There are 43 distinct subsets of three elements where two of the elements (call them x and y) are given. Each block containing both x and y contains three other elements, so it provides exactly three of these 3-subsets. So the number of such subsets generated by any list of blocks is divisible by 3. Since 43, the number of distinct 3-subsets containing x and y, is not divisible by 3, at least one of these 3-subsets must be repeated.
Clearly then we need at least 15 blocks containing x and y, giving 45 3-subsets (with two repeats). This improves the lower bound (45 choose 3)/(5 choose 3)=1419 to 1419*45/43=1485. I don’t know if this bound is achievable either, though.
This, of course, means that you’ll have what’s called a “covering design” — specifically, a design where all subsets of a given size occur at least lambda times, rather than exactly lambda times. Believe it or not, there’s a catalog of these available online, complete with a size-1617 covering design for t = 3, k = 5, v = 45.
Now that I’ve had a chance to look at this, no, this isn’t what I’m looking for. The De Bruijn sequence is meant for ordered sequences, whereas my problem deals with unordered groups. (For me, 1 2 and 2 1 are the same group, whereas they are 2 different sequences in De Bruijn.)