Stats Question

bcflyer · February 28, 2003, 10:03pm

Hi,

I’m trying to figure out the best way to determine the following:

There is a lottery that draws six numbers from 49. This means that there are a total 13,983,816 possible six number combinations.

I’m trying to determine how many of these 13,983,816 combinations have never had 3 numbers (or more) “come up”.

There have been a total of 1,992 draws since the inception of the lottery.

I have access to all of the data in raw format.

My question is, what is the best algorithim or method to determine the result?

Thank you very much!

bup · February 28, 2003, 10:17pm

If you have access to all the data, then there’s no ‘statistics’ or probability needed - write a program that has an array (or multiple arrays) of all the possible combinations - a counter cell for each one.

Then, for each of the 1992 draws, ‘vote’ for all the combinations that had 3 or more of those numbers. There should be 654 = 120 counters you increment for each drawing.

Then, at the end, go through the array, counting how many array cells never got any votes.

I fear, though, that you may be asking because you believe you’re going to devise a smart system that either likes numbers that have come up, or haven’t. Neither would work.

bcflyer · February 28, 2003, 10:24pm

Thanks for the reply. I guess what I was asking is what the most efficient way of pouring through the data is.

This is for curiosity only and is not being used to devise a system. It’s merely to settle a bet.

bup · February 28, 2003, 10:40pm

Crud. It’s tougher than I said.

For each drawing number, you choose 3 of them, and then come up with all the combos that include those three - each drawing should increment 6544645*44 different cells.

And there are more combos than you say, unless I’m missing something. If there are 6 numbers 1-49, and I’m assuming you can’t have the same number twice in a drawing, then it’s 4948474645*44 = 10,068,347,520 different combinations.

But it sounds like you’re describing powerball, which they always say is one in 70 million.

ultrafilter · February 28, 2003, 11:14pm

It’s going to be pretty difficult to fit this in memory, too.

Do you know any graph theory? It might be possible to come up with a non-trivial algorithm that way.

If I’m not mistaken, you might be able to do this with an Euler path in a bipartite graph. I’ll play around with it later.

bcflyer · March 1, 2003, 12:00am

Unfortunately, I don’t know much about graph theory. I did however feel that there was an easier way to do this than to run through each combination. This would be tedious and take a fair bit of time to process.

If you have anything to point me in the right direction, I’d appreciate it.

Thanks!

BC

ultrafilter · March 1, 2003, 1:23am

This might be an excellent time to learn some graph theory. Although my original idea was flawed, here’s an algorithm that will work (at least in principle):

Create a graph (V[sub]1[/sub] [symbol]È[/symbol] V[sub]2[/sub], E). Elements of V[sub]1[/sub] are the combinations that have been chosen, and elements of V[sub]2[/sub] are all possible combinations. There is an edge (v[sub]1[/sub], v[sub]2[/sub]) between v[sub]1[/sub] [symbol]Î[/symbol] V[sub]1[/sub] and v[sub]2[/sub] [symbol]Î[/symbol] V[sub]2[/sub] iff the combinations represented have more than 2 numbers in common.
Examine each v[sub]2[/sub] [symbol]Î[/symbol] V[sub]2[/sub]. If there is a v[sub]1[/sub] [symbol]Î[/symbol] V[sub]1[/sub] adjacent to it, move on. Otherwise, increment a counter, and move on.

This is essentially the same as bup’s idea, but the code ends up being a little cleaner. Be forewarned that this graph is way too large to fit in memory, and that’ll make things interesting.

Punoqllads · March 1, 2003, 2:03am

If I were to do it, I’d do it in C like this:




#include <stdio.h>
#include <stdlib.h>

#define MAXNUM 49

int Select[MAXNUM][MAXNUM][MAXNUM];

int countUnselected()
{
 int i, j, k, count;

 for (count = 0, i = 0; i < MAXNUM; i++)
  for (j = i+1; j < MAXNUM; j++)
   for (k = j+1; k < MAXNUM; k++)
    if (0 == Select*[j][k]) count++;

 return count;
}

void selectThree(int *list)
{
 int i, j, k;

 for(i = 0; i < 6; i++)
  for(j = 0; j < 6; j++)
   for(k = 0; k < 6; k++)
   {
    Select[list*-1][list[j]-1][list[k]-1] = 1;
   }
}

main()
{
 int list[6], lineNo, count;
 char line[256];

 lineNo = 0;

 while(fgets(line, 256, stdin))
 {
  lineNo++;
  if (6 == sscanf(line, "%d %d %d %d %d %d", list, list+1, list+2,
   list+3, list+4, list+5))
  /* &list[0], &list[1],
   &list[2], &list[3], &list[4], &list[5])) */
  {
   selectThree(list);
  } else {
   printf("Bad format on line %d:
\"%s\"
", lineNo, line);
  }
 }

 count = countUnselected();

 printf("There are %d sets of three numbers that haven't been selected.
",
  count);
}

In fact, I did do it like that and, for simple tests, it appears to work.

Punoqllads · March 1, 2003, 2:11am

Ignore that commented part; the original code generated a bus error, and I fiddled with it to figure out what was causing that.

Note to self: never name a C pointer “select” and then try to dereference it

ultrafilter · March 1, 2003, 2:22am

For my solution, btw, you’d want to implement the graph as an adjacency list. It’s very sparse.

Topic		Replies	Views
Statistics question Factual Questions	19	1111	January 4, 2003
Math question -- lottery odds. Is my math correct? Factual Questions	10	1492	May 21, 2003
Powerball odds and a question of statistics... Factual Questions	14	1821	February 19, 2001
Probability/argument with g/f Factual Questions	32	1499	October 9, 2003
Yet another probability question Factual Questions	8	1046	February 5, 2008

Stats Question

Related topics