A statistics question

Let’s say I want to figure out the probability of an event occurring. I’ll use playing solitaire as an example. I’ve played a hundred hands and I won eighty of them so I’ve established I have an 80% chance of winning any given hand. I want to figure out what the probability is that I’ll win the next twenty-five hands in a row.

The obvious method is that I have a .8 probability of winning an individual hand so I multiply .8 by .8 twenty-five times which tells me I have approximately a 1.15% chance of winning twenty-five hands in a row.

But is this the correct procedure? Because suppose I win the next twenty hands in a row (which I would have to do in order to win twenty-five hands in a row). At that point I’ve won 100 out of 120 hands, which means I’ve won 83.3% of the hands I’ve played not 80%. Should I still be using my original 80% figure to determine my odds for the next five hands?

Going back the other way, suppose I lost the first hand of solitaire I played. I should quit the game because I’ve now established that I have a zero percent chance of winning and any future games I might win will not change that baseline.

But if I’m playing roulette rather than solitaire, my logic doesn’t make sense. The fact that I’ve won twenty spins of the wheel in a row doesn’t have any effect on my chances of winning the twenty-first spin.

At the time you start trying for your streak, you should use your record to that point to determine the probability. As you play more and more games, you gain more and more information, and therefore your estimation of your probability at any given moment changes.

Under the initial assumption that the probability of winning any particular hand is 0.80, yes, that is the correct calculation (.80[sup]25[/sup] chance of winning 25 in a row). Whether winning a whole bunch in a row would cause you to question that assumption is another issue.

For a simpler example, suppose I flip a coin ten times in a row, and each time it comes up heads. What’s the probability that the next flip will also be heads?

If it’s a “fair” coin, with an equal chance for heads or tails, that probability is 0.50, and the previous flips are irrelevant. But that string of previous heads might well make me suspect that it’s not a fair coin.

Your value for the probability of an event is only an estimate, the reliability of which depends on whether your instances were actually independent, whether only genuinely random factors influenced the outcome, and how many events and how many of each outcome you observed. You shouldn’t observe one event, or even one hundred, and just accept the ratio of successes as the probability. If you don’t have any prior estimate of the probability based on theoretical considerations (eg. an expected 1/38 chance of winning on the roulette wheel), then you will have much less certain estimates.

Bayes’ Theorem is a popular tool for estimating probabilities based on observations. It has the advantage of being relatively simple, compared to many of the other tools statisticians use. So if you read that Wikipedia article and your eyes fall out, I don’t recommend statistics as a career for you. :slight_smile:

You are correct. After winning 100 games out of 120 you should raise your estimation of the probability of winning a game to 83.3%.

On the other hand, playing just one game and loosing it does not allow you to conclude that your average success rate is zero. It could pretty well be 0.5, or any thing below 1. As a matter of fact, at this stage all you my conclude with a relative confidence of is that your average success rate is bellow 0.95.
One can go a little further into the math. I assume that you know the binomial distribution B(n,k,p): you have n repetitions of a trial where the chance of success is p and B(n,k,p) tells you the chance of k success. Well, Bayes theorem tells you that if you make n solitaires games and win k of them, then your the probability distribution of p (where p is the average success rate) is P§=B(n,k,p)/K.
In this formula, K is a normalization factor which allow your probabilities to sum to 1 for all p (K = integral over p of B(n,k,p), from 0 to 1). Notice that the formula P§=B(n,k,p)/K assumes a flat prior, i.e. before you play any game, you consider that all possible values of p are equally probable.

Notice also that all of the above mathematics assume that games are independant. This implies, for instance, that your skill remains fixed.
In the case of the roulette, things are different because you just start from the assumption that your chance of success is fixed (1/37) and that spins are independant. In other words, you trust the casino. If you are to play from an old rusty and wobbling roulette, you may consider using the same reasonning as with your solitaire.

I work this out as a 0.377% chance.

That would pretty much describe me. For example, Oukile’s assumption that I know a binomial distribution is unfounded by the facts.

Suppose I’m doing a series of actions that have some chance of success that can’t be predetermined and can only be deduced by sampling. One week, I perform this action a hundred times and succeed sixty times. Next week, I perform it a hundred more times and succeed eighty times. Do I decide that my overall chance of success is seventy percent? Or can I say that my “real” chances are sixty percent and the second week just represented a streak of “good luck”? Or that my real chances are eighty percent and the first week was a streak of bad luck? Is there a way to make a meaningful distinction between the genuine probability of an event happening and a statistical fluke?

Actually it should. For example, if you spin the wheel 20 times and each time it comes up 22, you need to seriously consider the possibility that the wheel is rigged somehow.

Welcome to the world of “hypothesis testing”. In your example, you can’t be certain of anything except that the long-run probability is neither 100% nor zero. There is an infintesimal chance, after all, that the real probability is 99% and you’ve been spectacularly unlucky to achieve only 140 successes in 200 tries.

To quantify this, one asserts a “null hypothesis” that the true probability is “x”, and then assesses whether the evidence to date is sufficient to reject the null hypothesis at a given level of confidence. In my example, if my null hypothesis was that the probability of success was 99%, I would find that the evidence allows me to reject that hypothesis with an extremely high degree of confidence. If I chose a more reasonable null hypothesis–such as yours that the true probability is 80%–then I would have a worthwhile test. (Unfortunately I don’t have time right now to do the math.)

If you fail to reject the null hypothesis, that doesn’t mean that it’s true. In fact, it almost certainly isn’t true–what is the chance that the “true” probability, for example, is exactly 65.837% or any other number? However, the test allows us, as you put it, to “make a meaningful distinction between the genuine probability of an event happening and a statistical fluke”.

Well, two possibilities:

A - the overall probability did remain constant, and then it is 70%

B - the average success probability did vary from week 1 to week 2.

How would you decide which is which ? I this specific case, you may use statistics to compare the success rate between the two weeks (i.e. 60% vs 80% with n = 100 samples). A t-test will do the job and tell you that it is very unlikely that the success probability remained constant: so in this example you would reject case A.

Notice that there is no ‘real probability was 60% and second week was good luck’.
It is as simple as ‘either the two weeks are the same or they are not’ (case A or case B).

By the way, with a real probablity of 60%, you have virtually no chance of archieving 80 success out of 100 trials on the second week (for those who are interested, the number of success in 100 trials is close to a Gaussian with mean 60 and std. dev 5). So, suppose that you would be playing a roulette with 60% success rate (e.g. roll a 10-sided dice and bet that the result is 6 or less), then if you observe 80 success out of 100 trials, you know that there is something wrong with your roulette/dice.
What about a generic solution ? For instance you have a continuous series of trials and you are trying to find out wether they may come from independant trials with the same success rate ? Well, things get a little complex in here (and I don’t know the whole maths of this problem myself). In brief, this will have to do with cross-correlation analysis. Yep, at this level maths become… significantly complex.

Suppose I have a really large deck of randomly shuffled cards with a mixture of spades and diamonds. I don’t know what the overall mixture is and how many of each are in the deck. But I can turn them over one at a time.

If sixty out of the first hundred cards are spades and eighty out of the second hundred cards are spades, what can I conclude about the overall percentage of spades in the deck? It’s the same deck so I know that the overall precentage isn’t both sixty and eighty percent. Do I just assume that the biggest sample is the closest to the overall percentage and go with seventy? That seems arbitrary - saying that whatever sample I happen to have checked is the best sample to use to figure the results. If I had stopped at a hundred instead of going on to two hundred, then I would have used the exact same method to determine the overall percentage and obtained a different result.

Or to use a different idea, suppose I’m told that the overall percentages of spades and diamonds in the deck is fifty-fifty. But there are some sections of the deck where the deck is stacked - some of one suit is removed and replaced by the other suit. As I go through the deck flipping over cards, I find a run of cards which has more spades than diamonds. Is there any way to determine if this local preponderance of spades is part of a normal random distribution or if it’s part of the deck that’s been deliberately stacked with more spades?

The proportion in the largest sample is the most accurate estimate, assuming that the true distribution is consistent over time.

There is no statistical method which can be used to make that determination with certainty, but you can use the runs test to become suspicious of the idea that the deck is arranged randomly.

Let’s make it 65-75, I’ll be happier with that.

Basically, you are using all the information that you have at a given time. That sounds pretty fair.

At the time, you had less information. Futhermore, if you draw 65 spades out of 100, you can estimate that the average number of spade is roughly between 55 and 75 (compute your uncertainty as a function of the sample size).

Yes, although the general solution (i.e. deciding when these sections begin and end) is a little complex. But to make things simple, if you split your deck in sections of 100 cards, then you may conclude that you encountered a section when the deck is stacked whenever the number of spades is less than 40 or more than 60 out of 100. The boundary of these intervals may vary as a function of the number of sections you made, the number of cards in the section, and eventually the expected frequency of these stacks.

All I wrote comes from the classical t-test statistics and confidence interval computation methods.

If you do it this way, you run into all of the issues associated with multiple hypothesis tests. Since you can avoid those issues by using the runs test that I linked to earlier, there’s really no good reason to want to deal with them.

Well, this test will tell you wether all elements of a sequence are independant, but no where the stacks are.

You are very right about the multiple hypothesis problem. As a matter of fact, things are even a little more complex than that. Indeed, in our case you know that there should be stacks, so it is not legitimate to detect a stack only when you can reject the null hypothesis. The formulation of the problem would look more like ‘the frequency of stacks is …, the length of the stacks follows the … distribution, etc’. One could solve the problem using a Hidden Markov Model or some message passing algorithm (ouch, the message passing inference seems so specialized that I can’t even find a wikipedia page on it).

By the way, thanks for the link about the run test. Maybe it will be usefull for me one day.