What are examples of authentic Bell Curves

As long as we haven’t yet got smacked down by a mod or OP for going too far O/T in the discussion of the Pareto principle, I think some of these claims could use a little more interrogating.

In particular, this idea of the scalability of the “80/20 rule” in productivity really seems kind of dubious. Say I’ve got an organization of 500 employees, and the 80/20 rule is telling me that 80% of the organization’s productivity comes from 20% or 1/5 of them, i.e., 100 employees. Okay, but then you say 80% of that 80%, or 64% of the overal productivity, comes from just 20% of that 100. That is, twenty individuals in my 500-person organization are responsible for nearly two-thirds of its productivity? Okay…

Continuing with that reasoning, then, 80% of that 64%, or over 51% of the entire firm’s productivity, is due to 20/5 = 4 employees. And 80% of that 51%, or over 40% of the whole, is due to only one (well, 4/5 actually, but I rounded up to avoid dismemberment) individual?

Really? Four-tenths of all the productivity in my entire organization is due to just one person? I sure hope they’re driving very carefully!

Now, you may reasonably argue that the 80/20 principle is just a rough rule of thumb and isn’t scalable across multiple levels like that. Okay, then, what made you think it was reasonable to scale it over even one level? What are the actual mathematical and empirical constraints that need to be applied to this rule to tell us when, and to what extent, we can draw reliable inferences from it?

This anecdote about Pareto and the peas gets the Spock-eyebrow from me as well. Certainly, I can find all sorts of online references, mostly embedded in the genre of writing known as “corporate motivational bullshit”, to Pareto’s alleged discovery in 1896 (nice specificity) that 20% of the pods in his pea harvest yielded 80% of the peas.

But on skimming and searching several actual scholarly biographies of Pareto, I didn’t find any mention of peas in any of them. And I began to think about my own gardening experiences, in particular with pea crops, and the alleged results seemed… odd.

If I’ve got a hundred pea pods, for example, and the best 20% of them yield on average an impressive 5 peas each, I’ll get (20)(5) = 100 peas from the top 20% of my crop. If the remaining 80 pods yield an average of a measly one pea each, that’ll be 80 peas, which is a lot more than 20% of the whole.

Check my math, but I make it that I would need the best 20 pods yielding an average of 7 peas each, and the other 80 yielding an average of 0.5 peas each, to get close to the claimed “80% of peas come from 20% of the pods”. Wow, that is apparently one shitty pea variety Pareto’s got there!

My own garden pea harvests aren’t such a much, but I’ve never had a crop where four-fifths of the pods were nearly empty and the other one-fifth were crammed to bursting. If I did, I’d write a nasty letter to the seed company about it. That’s not how agricultural breeders design their plant yields to work. (What I do tend to get, in fact, are a lot of pods with a few peas in them, and a few pods crammed with many peas, and a few pods with one or no peas: bell-curve distribution FTW.)

So, subject to valid correction which I promise to receive in a becoming spirit of humility if forthcoming, I’m gonna hypothesize that this claim about Pareto’s peas is derived from some corporate-motivational-bullshit urban legend. AFAICT, there is no reason to think it’s reliable regarding the realities of either Pareto’s biography or pea yields.

Which is another reason I think we need to be careful about the hype surrounding the Pareto principle. As I said, we need meaningful mathematical and empirical constraints to make sure we can tell the difference between reliable models for real-world data and superficial “mathy” speculation.

You must admit that result certainly explains the CEOs’ paychecks. After all, he (almost always “he”) deserves a full 40% of the firm’s profits since he created them singlehandedly by his sheer awesomeness.

The paragraph above is joking of course. But overall, congrats on two excellent posts.


Continuing the hijack with an interesting side observation:

The Pareto principal the management consultants are so fond of asserts that massive income inequality is inevitable and therefore should not be pushed back against. Benfield’s law is often trotted out to make the same argument. But the two very legit mathematical principles produce two very different distributions.

Therefore either one of them is mathematical BS, or their assertions about the social/economic inevitability of their predicted outcome is BS. Anyone care to guess which of those is the actual factual correct answer?

? Benford’s law? Not having any luck tracking down a Benfield in this context…

But yeah, I agree that one of the main uses of spurious “mathification” of organizational processes is often to give some veneer of theoretical justification to massively inequitable compensation policies.

In the meantime, trying to get back to the main topic of the thread, I suggest that the OP’s project could honor the mythical “Pareto peas” by having students collect data on pod length of supermarket peas for a bell-curve fit. Informative and nutritious. :grin:

FWIW, I don’t really want to talk about pure human performance. I was making an aside that, absent reputable evidence, I was unconvinced; I wasn’t in any way trying to turn a perfectly interesting thread about one subject into a thread on an entirely different subject.

Sorry for eliciting this change of direction, @Saint_Cad; at this point I’m mostly interested in following along with your original question.

Benfords law

Benford’s law is one that you can easily send your students out to verify – use an almanac and have them make up histograms of the occurrence of each first digit in lists of populations of cities, or lengths of rivers, or areas of countries or counties, or any such data. Easier to find than tabulated data of things that ought to follow a Bell Curve.

In fact, a less obvious exercise is to plot out the frequencies of occurrence of pairs of numbers – the first two digits of each of these entries. The pairs of digits will also follow a logarithmic Benford curve, but you’ll need about a hundred histogram bins instead of nine, and you’ll need to plot a lot more numbers to make the plot visible.

As for normal data, I wonder if movie length (in minutes) for recent movies (past decade or so) might be close to normally distributed. I’m not sure where to find that data summarized though.

Also be careful using attributes of humans if the value is self reported. Way back in my high school days, circa 1980, the chemistry teacher wanted to give a quick lesson on displaying data using a bar chart. He had everyone report their height and give their gender, then he made separate bar charts for boys and girls. He noted that among the boys, 10 reported 5’10", 1 reported 5’11", and 10 reported 6’0". He said that males anywhere close to 6’ tend to say that they are 6’ whether they really are or not. I’m sure that women meeting men using computer dating sites can attest to the number of “6 foot” men who turn out to be much shorter.

Gaaah! Yes, Benford. Thank you.

I thought about providing a cite which would have caught my brain fart but I figured the folks running the hijack with me didn’t need it. Turns out I was the one who needed it. :man_facepalming:

Would Benford’s law simply show that some number distributions are not even, but logarithmic? Hence lower numbers are over-represented on a linear scale.

Benford’s law demonstrates that one boulder can be broken into a lot of small rocks and pebbles.

Thank you for the detailed reply. So if I understand correctly:

Let’s say I have a population that doesn’t follow a Gaussian distribution. The central limit theorem says that, if I look at the means from many samples, these means will follow a Gaussian distribution. Furthermore, this implies the original population is not Gaussian because not enough samples were taken. And that it would be (essentially) Gaussian if more samples were taken.

Is that correct?

Pretty much, although I’ll add that you have to have your other distributions added in with random means and widths in order for them to eventually look like a Bell Curve. If you added up a bunch of bimodal distributions with both of the peaks always in the same place, you’d end up with a bimodal distribution, not a normal curve. Which is as it should be – we DO see bimodal distributions under the right circumstances, as a result of taking a lot of data points. Or Poisson distributions or binomial distributions or whatever.

But

Central Limit Theorem (CLT): Definition and Key Characteristics.

Bimodal distributions will still tend towards Gaussians, if you’re looking at the mean.

Take the ultimate bimodal distribution, a roll of a 1d2 (that is, it has a 50% chance of giving a 1, and a 50% chance of giving a 2, and no chance of anything else). Now roll 10d2, and divide the result by 10: You’re probably going to get something close to 1.5. Whatever the result is, record it. Then roll 10d2 again, divide by 10, and record that result, and so on. Repeat many times, and the averages you recorded will still show something very close to a Gaussian distribution, even though you were starting from something perfectly bimodal.

As stated, I would say this is not correct. If you take many samples from a distribution and plot each, you’ll reproduce the actual distribution. You get a Gaussian by looking at the averages of many large samples. (Actually, in the limit as the size of the samples becomes larger.)

That’s what he said: “If I look at the means from many samples”.

True. The part I was reacting to was, “Furthermore, this implies the original population is not Gaussian because not enough samples were taken. And that it would be (essentially) Gaussian if more samples were taken.” This seems to suggest that sampling the original distribution would produce a Gaussian distribution if enough samples were taken, rather than reproducing the distribution itself, but I may be misreading that.