Over time, statistical outliers converge toward the mean. But what confuses me are this concept’s very-long-term implications re: human intelligence, physiology/health and athletic performance.
Regression toward the mean seems to allow for backward extrapolation. As a thought experiment, then, IF we could isolate for nature’s inputs (in the nature vs. nurture debate), can we safely assume a greater incidence of very high and very low IQs in the general population, say, 15,000 years ago? Can regression toward the mean possibly suggest that the proportion of genius IQs in the general population is slowly decreasing, over a period of centuries and millennia? How does one reconcile this statistical concept vis-a-vis athletic performances of the last, say, 100 years? Is this too short a time frame? Are environmental factors too confounding?
You make a fundamental mistake: regression towards the mean is a statistical phenomenon, not a “real” phenomenon. It results from choosing members of the group that are statistical outliers, rather than randomly from a group as a whole. The actual average for the group doesn’t change, whether for height, intelligence, or anything else.
Regression towards the mean is a little but more conditional than that: it does not occur in all data sets.
For example, think about regression towards the mean in its original context: heights of fathers and sons. It describes the fact that in Galton’s studies, very tall fathers tended to have shorter sons (and that very short fathers also tended to have taller sons). Now let’s imagine that something changes over time in the population… for example, better nutrition allows people to grow taller. By increasing the variance of height for the next generation, you will change the slope on the scatterplot of fathers’ height vs. sons’ height, and regression towards the mean might no longer be observed.
Related to that is the notion that heights, IQs, etc. in human populations will show more extreme outliers over time, rather than less (which you might believe if you thought regression towards the mean applied). There are larger populations of people in most successive cohorts, changes in environment, and over large periods of time, there will also be changes in genes contributing to these characteristics. So even if you could somehow control for nature v. nuture (I’m assuming here that by “nature” you mean genetics) you will find more extreme outliers over time.
Think of regression towards the mean as describing the phenomenon that without a mechanism, it is unlikely that successive data points from the same source will be outliers.
I think you’re confusing the actual meaining of “regression to the mean”. All it says is that things who tend to be statistical outliers now are less likely to be statistical outliers later, but it’s not like some force of nature like the 2nd Law of Thermodynamics.
Basically, look at perhaps Nobel Prize winners. Let’s say we give people a measure of “five year achievement scores.” Nobel Prize winners are statistical outliers in terms of this measure, but when we measure them again in five years, they’re unlikely to have won another Nobel Prize. It doesn’t really realate to the distribution as a whole.
**Colibri ** is right. There is some major confusion going on here. Regression toward the mean is a statistical concept.
Look at normal curve.. Notice that any normally distributed thing that you measure are going to pile towards the middle with the measure becoming less and less likely as you move away from the center.
Disclaimer: I am about to put this in very simplistic terms so it is easy to understand but could easily be the topic of several scientific texts.
Height is a trait that is normally distributed. Imagine a male 7’1’’ and a female 6’8’’ have a male child. What do you expect his height to be?
The answer is that the child will probably grow to be shorter than the father because the father is so far above the average that there is only a tiny area of the normal curve that could make him as tall or taller than the father. There is a huge area that would make him shorter.
This same example applies for people with parents very far or above or below the mean. There is a statistical “pull” to the mean simply because there are so many chances to be below the high outlier than there are to be above it.
On the other hand, there are a huge number of people in the middle of the distribution. They can easily produce someone a little taller or a little more intelligent than they are. Likewise, this large number of people will occasionally produce someone far above or below the average.
One other thing that might be mentioned (and which is covered by the site I linked to) is that “regression toward the mean” takes place in both directions. If you look at a time series, it takes place both forward and backward in time.
For example, if you pick the set of very tall fathers, their sons on average will be shorter than they are, and closer to the mean.
However, if you pick the set of very tall sons, their fathers will on average be shorter than they are, and closer to the mean.
Variance in the heights of fathers and sons takes place in both directions for all samples, whether at the extreme ends or in the middle. Although those at the extreme ends at the distribution will tend on average to have sons closer to the mean, those closer to the center of the distribution will produce some extreme individuals who will “replace” the ends of the distribution.
Take as another example a sample of fathers who are very close to the mean. Their sons will on average be farther from the mean than their fathers are. However, since variance takes place equally in both directions (barring some kind of environmental effect or selection), the average height of the sons will be exactly the same as that of their fathers. This sample is unable to show regression to the mean, because it is already at the mean.
In my psychology classes, regression toward the mean was explained this way: if you do really well on a on a test, i.e. better than you normally do, you probably won’t do as well on the next one. That doesn’t mean that you are slipping, it just means that if you’re far from your average on one test, you’ll be closer to your average on the next.
Suppose instead one particular student scores much more highly on a test than the class does as an average. You shouldn’t use regression toward the mean to conclude that she will score more closely to the class average on the next test, because she may be a better student. Her personal average, in that case, would be higher than the class average.
I once saw Gould on C-SPAN talking about the OP’s question, sort of. Baseball is less likely to be producing uberhigh batting averages than it had earlier in the century. Does this imply that baseball players aren’t as good as they were in the past? No, it is a reflection of the fact that better scouting, grooming, and training is reducing the variation between players by bringing more people nearer the limit of human ability.
Because we are nearer to the wall of human ability there will be fewer outliers on the really-good end of the batting scale, therefore we cannot conclude that players aren’t as good generally because we see fewer .400 batters. (Or whatever average; I don’t know baseball.)
As nutrition and education become more universally good (ugh, poor writing!), I’d imagine that we may see fewer exceptionally smart people because we are bringing everybody closer to the upper limit of human intelligence.
At least, that’s how I understood it from what he said…
I am not a statistician, but there seem to be two different scenarios involved. First, if you are drawing samples from a population which is known to be normally distributed, and you draw a sample more than a certain distance from the mean, your next sample is likely to be closer to the mean. But if you are using statistics to predict some actual random phenomenon, like how well you will do on a test or what the strength of a steel rod is, then all bets are off, because you can never know the distribution with a confidence level of 100%. In such cases, “regression to the mean” might apply in a rule-of-thumb sense, depending on how much data you had to base your prediction on, but there is nothing mathematical about it.
Here’s an absolutely nonsensical discussion of the topic. If you took the SAT 100 times and got 780 plus or minus 20, then took it once more and got 750, you would certainly not be more likely to score below 750 than above 750 on your next attempt. There may be an underlying assumption that the test taker is an average student, but if so, it isn’t stated.
Caveat Lector, but consider steel strength. Suppose your steel foundry produces steel that averages some measure of strength. Then one week, you find a distressingly high number of substandard steel samples. The boss comes down, yelling his head off at you, saying you need to straighten up the operations because he’s going broke.
If your operations are essentially the same, then next week you can expect an improvement in your test samples because they’ll regress toward the mean. If things are different, then the poor test samples may be the result of a change in inputs, procedures, or whatever. However, if nothing has changed, you can tell the boss that you’ll see to it and bet that next week he’ll be giving you an attaboy for fixing the problem.
Suppose you are promoted and you now manage five factories. One of the factories comes back with steel samples that are horrible. Don’t expect that factory to start producing samples closer to the five-factory average, because it may have a substandard process or substandard suppliers.
In the context where I learned it, the common mistake was to assume that an exceptionally good student will become more average because of regression to the mean. But that’s not the case. An 4.0 student isn’t going to slide down to a 3.0 as a result of the rest of the class having a 3.0 average. But if you see a 3.0 student score a 3.9 and a 3.8, you can expect him to be closer to his 3.0 average in the future—inasmuchas the variation in his test scores are the product of random error and not something instrumental such as a better study method.
That’s the way I understand it. As I said, caveat lector.