Stats Question


I’m trying to figure out the best way to determine the following:

There is a lottery that draws six numbers from 49. This means that there are a total 13,983,816 possible six number combinations.

I’m trying to determine how many of these 13,983,816 combinations have never had 3 numbers (or more) “come up”.

There have been a total of 1,992 draws since the inception of the lottery.

I have access to all of the data in raw format.

My question is, what is the best algorithim or method to determine the result?

Thank you very much!

If you have access to all the data, then there’s no ‘statistics’ or probability needed - write a program that has an array (or multiple arrays) of all the possible combinations - a counter cell for each one.

Then, for each of the 1992 draws, ‘vote’ for all the combinations that had 3 or more of those numbers. There should be 654 = 120 counters you increment for each drawing.

Then, at the end, go through the array, counting how many array cells never got any votes.

I fear, though, that you may be asking because you believe you’re going to devise a smart system that either likes numbers that have come up, or haven’t. Neither would work.

Thanks for the reply. I guess what I was asking is what the most efficient way of pouring through the data is.

This is for curiosity only and is not being used to devise a system. It’s merely to settle a bet.

Crud. It’s tougher than I said.

For each drawing number, you choose 3 of them, and then come up with all the combos that include those three - each drawing should increment 6544645*44 different cells.

And there are more combos than you say, unless I’m missing something. If there are 6 numbers 1-49, and I’m assuming you can’t have the same number twice in a drawing, then it’s 4948474645*44 = 10,068,347,520 different combinations.

But it sounds like you’re describing powerball, which they always say is one in 70 million.

It’s going to be pretty difficult to fit this in memory, too.

Do you know any graph theory? It might be possible to come up with a non-trivial algorithm that way.

If I’m not mistaken, you might be able to do this with an Euler path in a bipartite graph. I’ll play around with it later.

Unfortunately, I don’t know much about graph theory. I did however feel that there was an easier way to do this than to run through each combination. This would be tedious and take a fair bit of time to process.

If you have anything to point me in the right direction, I’d appreciate it.



This might be an excellent time to learn some graph theory. Although my original idea was flawed, here’s an algorithm that will work (at least in principle):

  1. Create a graph (V[sub]1[/sub] [symbol]È[/symbol] V[sub]2[/sub], E). Elements of V[sub]1[/sub] are the combinations that have been chosen, and elements of V[sub]2[/sub] are all possible combinations. There is an edge (v[sub]1[/sub], v[sub]2[/sub]) between v[sub]1[/sub] [symbol]Î[/symbol] V[sub]1[/sub] and v[sub]2[/sub] [symbol]Î[/symbol] V[sub]2[/sub] iff the combinations represented have more than 2 numbers in common.

  2. Examine each v[sub]2[/sub] [symbol]Î[/symbol] V[sub]2[/sub]. If there is a v[sub]1[/sub] [symbol]Î[/symbol] V[sub]1[/sub] adjacent to it, move on. Otherwise, increment a counter, and move on.

This is essentially the same as bup’s idea, but the code ends up being a little cleaner. Be forewarned that this graph is way too large to fit in memory, and that’ll make things interesting.

If I were to do it, I’d do it in C like this:

#include <stdio.h>
#include <stdlib.h>

#define MAXNUM 49


int countUnselected()
 int i, j, k, count;

 for (count = 0, i = 0; i < MAXNUM; i++)
  for (j = i+1; j < MAXNUM; j++)
   for (k = j+1; k < MAXNUM; k++)
    if (0 == Select*[j][k]) count++;

 return count;

void selectThree(int *list)
 int i, j, k;

 for(i = 0; i < 6; i++)
  for(j = 0; j < 6; j++)
   for(k = 0; k < 6; k++)
    Select[list*-1][list[j]-1][list[k]-1] = 1;

 int list[6], lineNo, count;
 char line[256];

 lineNo = 0;

 while(fgets(line, 256, stdin))
  if (6 == sscanf(line, "%d %d %d %d %d %d", list, list+1, list+2,
   list+3, list+4, list+5))
  /* &list[0], &list[1],
   &list[2], &list[3], &list[4], &list[5])) */
  } else {
   printf("Bad format on line %d:
", lineNo, line);

 count = countUnselected();

 printf("There are %d sets of three numbers that haven't been selected.

In fact, I did do it like that and, for simple tests, it appears to work.

Ignore that commented part; the original code generated a bus error, and I fiddled with it to figure out what was causing that.

Note to self: never name a C pointer “select” and then try to dereference it

For my solution, btw, you’d want to implement the graph as an adjacency list. It’s very sparse.