As you probably know, I’m a teacher. I was recently looking at some statistics on the two most recent tests, in my two Honors Algebra II classes. One thing I did was to look at a scatter plot of the second test scores versus the first one, and fit a least-squares linear regression to the graph.
Now, these two data sets are measuring basically the same sort of thing (student ability in algebra and in test-taking), and on the same scale, and so a priori, I would expect the best-fit line to be something close to y=x (that is, I would expect each student to get approximately the same score on the second test as on the first). But when I actually looked at the graphs, the actual best-fit line for both of them had a slope close to 1/2 (and a y intercept sufficiently high to give about the same average).
The really odd part, though, came when I transposed the data: Instead of plotting the second test score on the Y axis versus the first test score on the X axis, I plotted the first test score on Y versus the second on X. And when I did this, the best-fit lines still had a slope close to 1/2.
Now, I do understand that standard techniques for regression don’t treat X and Y symmetrically: The assumption is that the X variable is known perfectly, and that all of the error is in the Y measurements. So this isn’t completely paradoxical. And I suppose that this might even be justifiable, in terms of reversion to the mean (a student who got an extreme score on one test is likely to get a less-extreme score on the other, just because she’s not likely to be more extreme), although that would only properly apply to the portion of the measurement due to noise, not to the portion due to actual student ability.
Is there a name for this phenomenon? And is there a standard technique for avoiding it (say, a method that assumes that both measurements have errors, and in this case would assume that the errors in both are of comparable size)?
For what it’s worth, the mean, median, and standard deviation are all fairly similar between the first test and second test.