Randomness and the Bell Curve

Math wasn’t my strongest subject in school, and I should know the answer to this…

You have 100 people, and each person is given a coin to flip with a head and a tail on it. They are also given a sheet of paper where they are to tally the results of their flips. Each person is asked to flip their coin exactly100 times, and mark on their sheet whether it landed head or tail.

When everyone is done the tallies are collected and the data is plotted on a curve. Assume the coin and the flip is completely random, with no biases. Since there is an equal chance of getting a head or a tail, I’d expect the majority of people to have splits that are roughly 50 heads and 50 tails, but some will have gotten more tails than heads, and vice versa.

Would you expect to end up with a bell curve with the middle being those around 50/50 and one end those who had far more heads than tails and the other end far more tails than heads, and with the curve filled in with various combinations of heads and tails? Or would you expect something different?

Yes.

You can see it here. Make sure the probability of a success is set to 0.5, and put in 100 for the number of trials (n).

The distribution is actually a binomial distribution with p=0.5 and N=100 because it is a discrete probability function rather than a Gaussian (“Bell”) curve, but yes in the case of a purely random coin flip you would get an unbiased distribution within probability limits. What are those limits? In this case, you can analytically calculate the likelihood of more than m number of people getting 55 or more heads in a 100 toss trial and then run the experiment over and over to see whether the variation ‘regresses to the mean’ (which is the expectation). If it does not, it is a strong indication that something is wrong with the coin or some other aspect of your trials, and you can formulate a hypothesis (that there will be some expected deviation) to be rejected for the “null hypothesis” (an unbiased distribution with a consistent standard error).

This is how statistical trials for medicine, component reliability, et cetera are done and while statistics never tells you anything definitively it certainly can suggest significant trends.

As an aside, we’ve been avoiding the use of the term “Bell Curve” since the book of the same name came out. The preferred nomenclature is Gaussian or normal (for continuous distribution) or binomial (for discrete distribution) to avoid confusion or any association with the controversial sociopolitical claims in the book.

Stranger

Thank you both. Whew! I was hoping it was that simple.

That’s very cool. Bookmarked.

Note this doesn’t apply just to coin flips but for most of the random events we would typically run across in real life (plenty of mathematical nit-pickery on when it does and doesn’t work, but let’s not get lost in the weeds here).

Events will tend to look normally distributed (i.e. like a bell curve) as you take more and more random samples. There’s even a name for it - the Central Limit Theorem. The Wiki article even has a coin flip examples (more or less - it’s using 0s and 1s) and has an example with dice.

Yes. Consider 4 coin flips. each flip has 2 possible outcomes, so 2x2x2x2=16 possible outcomes.

Of those, there are only 4 sets that can result in 3 heads (THHH, HTHH, HHTH, HHHT), and only 1 that can result in 4 heads (HHHH). Meanwhile there are 6 ways to get exactly 2 heads (HHTT, HTHT, HTTH, THHT, THTH, TTHH)

If I can still do math…

With 6 tosses, there are 2^6=64 possible outcomes, of which 20 are even (3H,3T) and 15 are biased to 4H 2T, only 6 out of 64 to 5H 1T.

With 10 tosses, the odds go from out of 1024 outcomes, 254 are even 5-5, 210 are 6H-4T, 120 are 7H-3T, 45 are 8H-2T, 10 are 9H-1T, and only one outcome in 1024 is 10H. (And symmetrically for extra T)

Ans so on, the binomial triangle distribution that Stranger mentions. As you can see, deviating too far from the even distribution with an unbiased random coin flip becomes highly unlikely, compared to the even distribution.

My handy dandy Windows scientific calculator (assuming I can use it right, right formula, etc.) says an even split of 100 coin tosses is 100!/(50! 50!) = 100,891,344,545,564,193,334,812,497,256 or about 1x10^29.
Whereas getting 55 heads is 100!/(45! 55!) = 61,448,471,214,136,179,596,720,592,960 or about 6x10^28
how about 60 heads? 100!/(60! 40!) = 13,746,234,145,802,811,501,267,369,720 or 1.3x10^28

So if you are likely to get 50-50 heads comparatively for, say, 100 times, then about every 61 times you get 55-45 heads, and 13 times you get 60-40 heads, and so on.

Whereas, at the fine tail end, there is only 1 in 2^100 chances it is all Heads, but 100 in 2^100 chances of 1 not heads. Both are what mathematicians call “very unlikely.”

The reason that random results of a population with a single variable tend to follow a Gaussian distribution is explained by the Central Limit Theorem. If you take enough samples from a population with the variable of any probability distribution, eventually it looks like that bel-shaped curve.

Central Limit Theorem.

As an example, if you add up enough of samples from a random number distribution between 0 and 1, the resulting distribution of numbers looks more and more like a gaussian. In fact, a quick and easy way to create a population of pseudo-random numbers with a gaussian distribution is to add up twelve random numbers from a pseudo random number generator in the range zero to one and subtract 6. This gives you a population centered on zero with a width of about 12. What’s interesting is that you can get a gaussian-looking result even if the probability distribution for each random number isn’t uniform, but lopsided. Use enough numbers, sand eventually it evens out.

I just want to be sure I understand, because I haven’t thought about this stuff in a long, long while. I know that for a large enough n, a set of sample means will tend to look normally-distributed…is it also true that the distribution of one very large sample will also look normally-distributed?

Not sure what you mean by “one very large sample”.

I would expect the heights of a large number of random people to follow something like a gaussian distribution. Or the eights of a large number of hummingbirds. In manufacturing, the whole Process Control strategy depends upon the parameters of manufactured parts (their sizes, or the resistance values of resistors, or the radii of curvature of lenses, or whatever) following a “normal” distribution.

But don’t expect the number of houses along a street, or the zip codes of a lot of people, or the values of refractive indices to follow this sort of distribution – they’re not really random.

Okay, thanks for the clarification.

So, this gets into an issue that people (even statisticians) often neglect; to wit, that a Gaussian (normal) distribution assumes random variability about a mean. This is a good assumption for many types of processes or variations, particularly random error that is definitionally Gaussian. However, in many physical processes this assumption is not a correct representation. Height distributions in a well-nourished population tend to be relatively close to normal (although often with fatter tails than you’d expect because the natural variability from genetic factors is a co-variant to developmental nutrition) and of course there are no healthy adults who deviate from the mean to be 1 ft or 10 ft in height because of physiological limitations regardless of what the distribution predicts) but in more unevenly or poorly nourished populations they are often closer to log-normal (a normal distribution applied to the logarithm of the measured parameter). Many other physical phenomena are also best represented as log-normal such as random vibration events due to aeroacoustic loading in flight. And many non-natural phenomena such as the distribution of sushi restaurants in Manhattan aren’t even close to a random distribution such that you have to go to kernel density estimates to produce a useful prediction of likelihood.

For the hypothetical experiment of the o.p. ‘normality’ is a good assumption because flipping a coin should be a definitionally random process with two discrete results of equal probability, but this is not true for all physical phenomena.

Stranger

For practical purposes, height is a pretty good case of folowing a normal distribution
https://static1.squarespace.com/static/585718168419c246cf6f204e/t/5ab7e2de70a6adbbb6bcf676/1522000606378/STATISTICS%2B-Dimensions%2B-%2B3-19-18%2B(1).pdf

You can certainly find situations where it won’t – grabbing people from different populations where the genetic makeup affects height or differences in nutrition or in sex affect it. But if you grab a bunch of people living in the same area at the same time, a normal distribution is pretty likely.

By the way, calling this curve a “normal distribution” seems wrong, because it’s only normal for some phenomena. If you’re counting clicks per second on a Geiger counter, for example, you would not expect this curve. For one thing, you can never get a negative number of clicks (which this curve would mandate at least very occasionally). A Poisson distribution is what’s normal for this phenomenon. If you mount a laser on a turntable near a straight wall, and take snapshots of where along the wall the laser is hitting (at random moments), you will get yet another distribution, a Cauchy distribution, which has the interesting property that any individual measurement tends to be a more accurate predictor of the laser turntable’s position along the wall than the average of some number of measurements is. This is because the extreme outliers will be VERY extreme and overwhelm the average. (A median, however, is an excellent predictor here.)

If you use the term “Gaussian distribution”, you’re not suggesting this normativity. Besides, Gauss himself was very cool, so why not?

Like it or not, the word “normal” has several technical meanings in mathematics and statistics that don’t match the most common layman’s meaning.

The conditions that lead to getting a Gaussian distribution (or at least, something very well approximated by a Gaussian) are quite common, but be careful, because conditions that don’t lead to a Gaussian are also quite common. This lack-of-normality is usually most pronounced far off in the tails of the distributions.

For instance, look at the distribution of age at death of people. On average, it’s somewhere in the 70s, and most people are within a decade of that. But if you try to construct a Gaussian distribution, you’ll find that there’d only be something like a 1 in 100 trillion chance of someone living to be more than 110, and yet there are several people out of a mere few billion who have done so. And the Gaussian model would predict even fewer people dying at age 10 or less, but that’s even more common in the real world.

Magician Derren Brown filmed himself tossing ten heads in a row - you can see his explanation of how he was able to do that here [start at 1:00]. And yes, its relevant to the discussion above.

“Normal” in the context of statistical distributions means that the moment is perpendicular, not that is typical or expected. This means that the distribution is strictly symmetric about the mean (no bias), and the standard normal distribution is actually centered about zero, whereas the general normal distribution (sometimes called the Laplace–Gauss distribution) allows for a non-zero mean but still conforming to the unbiased distribution with the defined standard deviation. There are more generalized Gaussian distributions that allow for exponential factors in the variance (thicker or thinner tails) and shape parameters for skewness (allowing the distribution to be rotated) while still being ‘normally’ distributed about the mean.

A Poisson distribution is a progression-based distribution (usually assumed to be time but could be any metric over which the rate of events or measurements may vary), whereas the Gaussian distribution makes no assumptions about the rate of individual events or how they occur in relation to one another. It has a completely different purpose and is not “normal” in any mathematical sense, although the measurement error or innate variability of those events may itself be a normally distributed value.

Stranger

Can you clarify that last bit? I think of the Poisson as a distribution where the mean is equal to the variance and is a constant, not sure beyond that.

That is correct; the expectation (mean) and variance are the same; in practical application, it is a representation of the likelihood of k events occurring over some interval, or conversely how long of an interval to get some probability of P(k) events.

Stranger