Question About Statistics and Standard Deviation

Say I had a coin-tossing computer program that was rigged so, instead of 50/50 odds, heads came up 66% of the time and tails came up 33% of the time. However, instead of tossing one coin at a time, it tosses six coins. Regardless of how the tosses are rigged, three heads and three tails is still the most likely outcome since it has the most possible permutations: 20. Four-of-one and two-of-the-other is second most likely with 15 possible permutations. Five-of-one and one-of the-other has six permutations, and all heads or all tails can only be thrown one way.

My question is this: How many rigged, six-coin tosses would the computer have to throw before standard deviation made it apparent that the game was fixed?

I’m not trying to pick nits, but shouldn’t the total percentage come out to 100%?

How are you scoring this? If you are just looking at the individual coins, it doesn’t matter how many you toss at a time. Each coin is one trial or am I missing something? Also, you have to define “apparent the game is fixed”. It is always possible for all coins to come up one side of the other no matter what. You have to set something like a confidence interval for your criteria.

First thing, the calculations in the OP aren’t quite right. Consider - if the coins were 99% head and 1% tail, your most likely outcome is 6 heads. In fact, that outcome would be more likely than ALL other combinations together.

To answer the actual question:
A statistician would phrase the question -
For n tosses, what is the probability that a binomial distribution with a 50/50 coin would give me the actual outcome that I’m seeing.

You will also see this phrased as a binomial confidence test against the hypothesis that it is a 50/50 coin.

Once you’ve selected how certain you want to be before you “doubt” that it is a fair coin, then you can determine n, the number of tosses before a particular distribution fails the test.

If you had coins that came up heads 2/3 of the time and tails 1/3 of the time and tossed six of them, you would get four heads with probability 240/729, five heads with probability 192/729, and three heads with probablility only 160/729. You would start to get suspicious rather quickly. Of course, these are only probabllitites. You might still get six tails (once in 729 throws).

The point is that you can’t just count. You have to weight the count by the odds.

As others have pointed out: Just because it has the most possible permutations doesn’t make it the most likely, because not all of the permutations are equally likely.

Your situation is a classic example of a binomial distribution: You have a certain, fixed number of trials n (in your example, 6), and a certain fixed probability p of “success” (e.g. “heads”) on each trial (in your example 66% or 2/3).

In that case, the probability of getting X number of “successes” out of n trials (e.g. coin flips) is ([sub]n[/sub]C[sub]p[/sub])(p)[sup]X/sup[sup]n-X[/sup]

The ([sub]n[/sub]C[sub]p[/sub]) part counts the number of combinations (not permuations, because the order of heads/tails doesn’t matter, just how many of each there are), and the (p)[sup]X/sup[sup]n-X[/sup] part counts the probability of getting one of that particular kind of combination.

I am neither all that smart or driven to look this up, but I would do either an analysis of variance on the results, or a p-test on the percentages that I was getting as compared to the universe expected of 50%. The resultant value from that analysis would give you a confidence level that the two processes were different, generally a P<= 0.05 means you are different to a confidence level of 95%, as I recall.

To use the extreme example, what’s the probability that I’ll be struck by lightning on my way home? Well, there are two possibilities: I’ll get struck, or I won’t. So it’s 50-50.

It’s not the standard deviation that gives the game away here. The best estimate of the probability that any given coin will come up heads is the proportion of coins that come up heads, and in the long run that will converge to the true probability. After some number of tosses, it will be quite clear that the coins are not fair.

Neither an ANOVA nor a t-test has anything to do with the problem at hand.

Oops! I made a mistake or two! First, the [sub]n[/sub]C[sub]p[/sub] should have been [sub]n[/sub]C[sub]X[/sub].

And the [sub]n[/sub]C[sub]X[/sub] (“n choose X”) counts how many different ways there are of getting X out of n successes, because it counts the number of positions the successes are in. For instance, [sub]6[/sub]C[sub]2[/sub] counts the number of ways of getting exactly 2 heads out of 6 flips, because it counts the number of positions those two heads could be in (1st and 2nd, 1st and 3rd, etc.).

Which is why I typed “p-test”. What proportion of values are heads compared to the 50% of the universal population. Anyway, to the OP, generally 30 datapoints are considered enough to make a statistical comparison. Like I said, I’m neither all that smart or willing to look things up right now, my stats books are at work. So here’s a wiki link.

*For example, an experiment is performed to determine whether a coin flip is fair (50% chance of landing heads or tails) or unfairly biased, either toward heads (> 50% chance of landing heads) or toward tails (< 50% chance of landing heads). (A bent coin produces biased results.)

Suppose that the experimental results show the coin turning up heads 13 times out of 20 total flips. The p-value of this result would be the chance of a fair coin landing on heads at least 13 times out of 20 flips. The probability that 20 flips of a fair coin would result in 13 or more heads is 0.0577. Thus, the p-value for the coin turning up heads 13 times out of 20 total flips is 0.0577.*

Once the computer starts tossing six coins, doesn’t the event cease to be binomial? When tossing six coins, there are seven outcomes, and they are not all equally likely.



1) Six heads                  1 way    1.56%
2) Five heads / one tail.     6 ways   9.38%
3) Four heads / two tails.    15 ways  23.44%
4) Three heads / three tails  20 ways  31.25%
5) Two heads / four tails     15 ways  23.44%
6) One head / five tails.     6 ways   9.38%
7) Six tails                  1 way    1.56%


If I offered to pay you a dollar every time the outcome of your choice came up when tossing six coins, three heads and three tails would be the best selection.

Alternatively, say I’ve got a coin that’s rigged to come up heads 99% of the time, and I offer to pay you a a penny every time six heads come up but you have to pay me 99¢ any time any other outcome occurs. After 100 tosses, I would come out ahead, because there are 63 ways for me to win but only one for you.

Yes, there are 63 ways for you to win, but none of those ways is very likely.

No, the distribution is still binomial with parameters are n = 6 and p = 2/3. Here’s the actual correct distribution for the number of heads:



6      0.0878
5      0.2634
4      0.3292
3      0.2195
2      0.0823
1      0.0165
0      0.0014


Here’s a question for you: You have a coin that’s rigged to come up heads two thirds of the time. Since there’s only one way it can come up heads and only one way it can come up tails, why doesn’t that mean that the two outcomes are equally likely?

You’ll often see 30 points as a rule of thumb for a normal approximation, but that has nothing to do with statistical power or with the problem at hand (there’s no reason to use a normal approximation here).

If you gave me 2:1 odds, then we should come out even.

Say we have a coin that’s rigged to come up heads 99% of the time. I owe you 1¢ every time it comes up heads and you owe me 99¢ every time it comes up tails. After 600 tosses I should owe you 594¢ and you should owe me 594¢.

Now, say we take six coins rigged to come up heads 99% of the time. I owe you 1¢ every time six heads come up and you owe me 99¢ every time any other outcome comes up. After 100 tosses I should owe you 94¢ and you should owe me 594¢.

Even though both tests involved 600 coin tosses, the second works in my favor because it is not binomial. Here’s an Excel spreadsheet showing the results of 10 tests.

I don’t know how you generated that spreadsheet, but ten trials is nowhere near enough. Here’s R code for flipping six of your biased coins 100,000 times and printing the results:



p <- 2/3
n <- 100000
k <- 6
m <- matrix(runif(n*k) < p, ncol = k)
table(rowSums(m))


For one trial, the results agree very closely with the distribution I posted above:



6      8706
5      26238
4      32725
3      22081
2      8418
1      1693
0      139


(You may also want to do some reading on the binomial distribution that I linked to earlier. It’s not what you seem to think it is.)