Assume I have a properly balanced coin and am tossing it in a manner that gives it a 50% chance of coming up heads.
I’ve flipped it 5 times and got five heads.
Since these are independent events, the chances of coming up heads on the 6th flip should be 50%.
But, the principle of regression towards the mean says that a larger sample size will yield a truer average.
The 6th flip would increase the sample size by one, so would this make the chances of the 6th flip coming up heads slightly less than 50%(and the chances of coming up tails greater than 50%)?
Also, I seem to remember there existing something called the Law of Independent Outcomes, but I couldn’t find anything called that. Is there an official name for this??
A larger sample size will tend to yield a truer sample mean. After the 6th toss you’ll have either 6 heads or 5 heads and a tail. On average you’ll have a sample mean of (.55 + .56)/6 = 11/12 which is closer to the sample mean than 1 you currently have. Note I’ve used the independence in applying .5 to the two outcomes.
Another way to think about it, suppose you’ve flipped twice and have one heads so your sample mean is now 1/2 exactly what it should be. The next toss has to make it worse.
A 50% chance of the next flip coming up heads is what causes regression towards the mean:
As it stands, you have a 100% heads rate. But there’s a 50% chance that’ll stay at 100% on the next flip and a 50% chance that’ll go down to 5/6 ~= 83% on the next flip. So, on average, the next flip will bring your heads rate down, to 5.5/6 ~= 92%.
Regression towards the mean is not some magical physical force: it’s just the tautological observation that an uncommon result is less likely to occur than a common result, even when an uncommon result has already happened before.
In other words, if you gather a bunch of people who’ve managed to get 5 heads in a row, and then have them each flip again, half of them will see their heads rate drop to 83%, and thus the group overall will see its heads rate drop to 92%. That’s regression towards the mean. It’s entirely driven by the fact that the coin probabilities stay 50/50 on each new trial, for lucky and unlucky flippers alike; there’s no further shifting involved.
I really wish I could remember who said it, but there’s a great quote on this subject: “The law of large numbers doesn’t work by correcting past mistakes; it works by making them insignificant.”
Let’s say that I’m going to flip a coin 1000 times. What’s the most likely number of heads? 500. OK, now let’s say that I’ve started my 1000 flips, and by luck, the first 10 happen to all have been heads. Now what’s the most likely number of heads? Well, I’ve got 10 heads so far, and out of the remaining 990, the most likely number is 495. So now the most likely number of heads I’ll get is 505, not 500. But 505 is really close to 500. And if I decide that I’m going to flip a million times, then the most likely number of heads is 500,005 , which is really, really close to half of them. No matter how many flips I decide to make, the most likely number of flips is always going to be 5 more than half of them, but if I flip a very large number, then that 5 more is piddly by comparison.
On a quick read, some of this seems to be skirting the issue of the Gambler’s Fallacy. If the flips are really independent with a perfectly balanced coin (which is the model you’re assuming), your P(heads on the 6th flip) = 1/2. You’re not “due” tails. By independence, P({flip heads 6 times in a row}) = (1/2)^6. But also, P({flip heads 5 times in a row} and {flip tails on 6th toss}) = P({flip heads 5 times in a row})P({flip tails on 6th toss}) = (1/2)^5(1/2) = (1/2)^6.
The coin is a model, and the model produces data, not the other way around.