In order to serve that most noble of causes, namely winning an Internet bet, I need to find a way to solve a particular problem by either using an existing program or having someone write some code for me. The first approach is preferable, since I have no idea as to how to do the second.
Here is the problem: I want to create a database of all possible means of n cards drawn without replacement from a hat of m cards numbered x1,x2,x3,x4…etc. So drawing 4 cards from a hat of 200 gives 200x199x198x197 = 1,552,438,800 possible combinations, each with its own mean. I wish then to compare this database to a given number and determine exactly how many of the means are greater or less than that given number.
Can anyone point me in the right direction - such as a way to do this with Excel or something, or how to find someone to write me a program? I have a Statistics101 program with which I can check p means of n cards against a given number, but the p means are drawn randomly rather than systematically. For large n and m, I’d expect to have to let the thing run on a laptop for a looong time, but that’s not a dealbreaker.
It sounds like the problem isn’t that hard of one to solve in any reasonable programming language.
So, you’ve got a set of m cards with some numbers on them x[sub]i[/sub], and you want to consider all sets of n of them. By “mean”, do you mean arithmetic mean? as in: the average?
So, then you’ve got a whole bunch of averages, and you want to know how many of them are larger than some number p?
Do you need to know exactly? If not, then the easiest solution is to just do some random samples and see what fraction are larger than p. You can make this as accurate as you want by increasing the size of the test run.
In principle, this isn’t a hard problem, but the devil is in the details. If you literally want to do all 1.5 billion cases (as opposed to the Monte Carlo sampling (:3= suggests), you’re going to have to use a ton of memory, and most programming environments will make you jump through a bunch of hoops to use that much memory.
If there’s only one given number that you want to compare to (or a small number of them), then you could get away with not actually storing the database, but just looking at one draw at a time, and keeping running totals.
Well, skipping over the limits that **Chronos **mentions, which are going to be an issue no matter how you do it if you really want to use a 200-card deck, you could go through Excel to Visual Basic for Applications. I started poking around with it, and it shouldn’t be too complicated. If no one has a better option, I’ll finish it tomorrow.
If you’re curious about the process, I started out with a macro to create a database for small sets of cards (like, 3 cards from a 6-card deck) and once I’m sure that’s working properly, I’ll probably drop the write portion, and just start counting how many of the means are sufficiently large. Of course, you could also probably just write an algebraic function to determine the answer, but I’m unfamiliar enough with statistics that I like being able to check my answers (at least, the early ones).
Do you have to actually run all cases, or can you just calculate results? In your example, the averages are discrete values from 2.5 (10/4) up to 198.5 (794/4). There’s only one combination that results in 2.5 (1 + 2 + 3 + 4)/4, one that results in 2.75, two that result in 3, three that result in 3.25, etc. Seems to me there should be some standard permutation/combination calucualtion you could use to step through all possible discrete values.
Yes, I want to exhaustively pick n cards and calculate the arithmetic mean of their values. I’ll need to be able to specify what the card values are, since they won’t necessarily follow a nice pattern: 0, 1, 3, 3.45, 6.7, 9, 57, etcetera. I will be comparing the means to only one or two ‘experimental’ numbers. So far I can’t see a reason to actually keep track of the individual means, as long as an exact running tally is kept of whether each mean is greater than the ‘experimental’ number, and it can be shown that every possible combination of n cards has been tested.
(:3= and Chronos have struck right to the heart of the matter: the issue is one of comparing Monte Carlo approximations (for a variable number of samplings) to an exact answer. I’ve already got a way to run Monte Carlo but to settle the issue I need to show what the brute-force result is. For the purposes of this bet, just running Monte Carlo with a number of samplings that is, say, 100 times the number of possible combinations of n cards from m is insufficient.
I have a working program (kind of a fun little challenge). How many numbers (cards) do you have, if not a ridiculous amount, just post them with commas separating them, I’ll copy into the program and run it for you.
I don’t know that it matters, but I forgot to add that some of the cards are repeated multiple times. It’s late here and I’m tired and can’t figure out how the repeated cards affect Peter Morris’s point about ordering.
At any rate, here’s the first set - 72 cards in total - six copies each of the following: 0, 10, 20, 25, 27.5, 29, 30, 30.5, 31, 31.5, 32, and 32.5. Draw 4 cards without replacement and determine the arithmetic mean of their values. Compare with the ‘experimental mean’ of 31.75. What percentage of means are equal to or greater than 31.75?
If that appears doable, the next set gets quite a bit harder: 340 cards made up of 12 groups. Ten copies of 0, then 30 copies each of the following: 10, 20, 25, 27.5, 29, 30, 30.5, 31, 31.5, 32, and 32.5. Draw five without replacement. Compare the means to the ‘experimental mean’ of 32.1. What percentage of means are equal to or greater than the experimental mean?
When I said working program, I may have wanted to qualify that as “it compiled and the combinations looked accurate when tested with 6 cards 3 drawn”. I believe my program is creating the set of combinations not permutations, hopefully that is what you want.
For the first set, this is what I get:
Total combinations=1028790
Total with mean (as in the average of the 4 cards drawn) > =31.75=8706
I’m running the big one (340 with 5 cards drawn). It’s going through about 1 billion combinations per minute, this is probably going to take a while.
It’s at 9 billion and not a single hit yet (my algorithm starts with the first 5 cards selected, then the rightmost card shifts up one position, etc. so it starts with the lowest numbers in your list)
Thanks very much for doing this. The first one (8706/1028790 = 0.008462) matches the Monte Carlo numbers that I was getting (0.00846). I’m not gonna sweat significant digits on this one. Looking great so far!
When you say ‘total combinations’ I take it that that means total number of unique means, so for example all the various ways of drawing four zeros get counted once? That’s fine as long as we can establish somehow what the exact relationship is between unique means and total permutations - the formula for that escapes me at the moment.
This is how my algorithm works (example with 6 cards, 3 drawn at a time):
Card Value 0 0 10 10 20 20
Card Index 0 1 2 3 4 5
Cards Drawn 1st X X X
Cards Drawn 2nd X X X
Cards Drawn 3rd X X X
Cards Drawn 4th X X X
Cards Drawn 5th X X X
Cards Drawn 6th X X X
Cards Drawn 7th X X X
Cards Drawn 8th X X X
Cards Drawn 9th X X X
Cards Drawn 10th X X X
Cards Drawn 11th X X X
Cards Drawn 12th X X X
etc.
It’s not perrmutations because the order of drawing the cards doesn’t matter, it’s just unique combinations of the cards available.
But it’s not unique means because 2 different combinations could have the same mean. For example, in the set you gave me with 10 zero cards, the first 2 sets of 5 cards drawn (1,2,3,4,5 and 1,2,3,4,6) will both have a mean of 0, but they are unique combinations of the cards so they both need to be counted (unless you were looking for something else).
Thirty-six billion thanks for running that for me, RaftPeople! Man, the SDMB is an awesome place. Now I can go off and school Somebody Who Is Wrong on the Internet.