Just got done teaching my intro stats class today. We went over how to find quartiles of data. I showed them the way I learned to do it (and how SAS software does it by default). But, that way isn’t the way the textbook does it, and the two methods give different answers. So now, I’ve probably confused the hell out of everyone.:smack:
Given the data:
60
65
65
70
75
75
75
80
80
80
85
85
85
My method (and SAS’s, and this thing’s) says the first quartile (the 25th percentile) is 70, and the third quartile (75th percentile) is 80. You take j = n(p), where p is the desired quantile.
If the result is a decimal, round j up and take that ordered observation.
If the result is an integer, take the average of the j and (j+1)st ordered observations.
The textbook, on the other hand, says find the median M. Then find the median of all observations less than M; that’s the 25th percentile. Then find the median of all observations greater than M; that’s the 75th percentile.
Well, the textbook is wrong, where data clumps like this. To take an extreme case:
0,5,10,10,10,10,10,10,10,10,10,10,10,15,20
The median is 10. There are only two observations less (0 and 5), and two greater (15 and 20). So their method gives 2.5 as the 25th percentile, and 17.5 as the 75th percentile. I think the right results for each are 10 and 10. (Less than 25% is less than 10; less than 25% is more than 10.)
That’s odd. What the textbook says makes sense for quartiles, but how does it say to calculate (say) the 33rd percentile?
SAS seems to have 5 different ways to calculate percentiles according to the documentation! Nobody has ever bugged me about which method was used except back when I was also using Statview (a program that SAS Institute bought out from the original owners and then dropped in favour of JMP :(); as I recall Statview used a different method than SAS’s default.
The textbook is very basic; it doesn’t even go into finding percentiles for the general case. It just sticks with quartiles.
Were this really and truly my course, I’d just say “fuck the textbook” and do it my way. But I teach one of about 40 sections of this course, and it’s a coordinated course with a common final exam. We instructors are told to stick to the textbook, since the final exam will be comprehensive and from the test bank.
My seventh-grade textbook says roughly the same as yours. The kids need quartiles for box-and-whiskers graphs. Luckily they won’t remember anything I tell them by the time they have a stats class.
The textbook I recently taught from (Moore/McCabe/Craig) teaches basically the method in the OP, but gives a little more direction. In this case, the first quartile would be found by taking the median of the data set 0,5,10,10,10,10,10 rather than 0,5.
Excel calls the method you use, which gives whole numbers here, the “Inclusive quartile.” They call the “median of less/greater than median” method, which here gives decimal places, the “exclusive quartile.”
After a little research, I remembered that the method in the textbook is a modified version of Tukey’s method. The difference is that Tukey says (well, said, since he’s been dead for 10 years) to include the median M in the calculation of the quartiles.
So given the data
1 2 3 4 5 ,
The median is 3 Now find the median of the lower half, which includes the 3:
1 2 3 . The median is 2. So the first quartile is 2.
The upper quartile is the median of
3 4 5, which is 4.
There is a surprising amount of literature on this topic, in case anyone is interested. Here’s a good starting point, a journal article reviewing the most common methods.