Help with "Box and Whisker" Plots

I’m tutoring a 7th grader in Math, and even though I’m an English/Art teacher, I manage to do a good job of explaining concepts to my students. I’ve even dredged up long ago memories of graphing equations and other abstruse topics. But every now and then, something shows up that I never encountered in all my years of math in the public school. One of these is the Box and Whisker Plot.

Now, I get the concept. It’s a handy visualization of the range of a set of data points, giving you the minimum, the maximum, the median, and the lowest and highest quartiles. I had to do a quick review of median because I, literally, never use that in real life.

I get how to find the median in a range of data. If the number of data points are odd, then it’s the exact middle number. If the number of data points are even, you take the two middle numbers and find the mean.

What’s killing me is finding the lowest and highest quartile.

The lowest quartile is supposed to be the median of the minimum point and the median point. The highest quartile is supposed to be the median of the median point and the maximum point. Yet, every time I tried to solve the problem, I was off on one of them.

I figured out what the problem is: I don’t know whether or not to include the median point itself, or to use the range of numbers up to the median and exclude the median. And in the case of an even set of data points, do I exclude both middle numbers, or do I include the one middle number and exclude the median?

Math sites found on a google search are not helping, because they either don’t show how the median for the lowest and highest quartile are determined, or they give conflicting information.

Math dopers, a little help?

Percentile: the value below which a certain percent of the values fall.

The lower quartile is the 25% percentile.
The median is the 50% percentile.
The upper quartile is the 75% percentile.

You have the median down pat, I think.
Let’s look at the 25% percentile.
Let’s say that you have 12 numbers. 25% of 12 is 3. So you are looking for a value that is above the 3rd smallest number, which would be between the 3rd and 4th smallest numbers.
Let’s say that you have 10 numbers. 25% of 10 is 2.5. So you are looking for a value that is above the 2.5th smallest number, which would be the 3rd smallest number.

Similarly for the 75% percentile.
Let’s say that you have 12 numbers. 75% of 12 is 9. So you are looking for a value that is above the 9th smallest number, which would be between the 9th and 10th smallest numbers.
Let’s say that you have 10 numbers. 75% of 10 is 7.5. So you are looking for a value that is above the 7.5th smallest number, which would be the 8th smallest number.

I think that I got that right.

Divide the data into an upper half and a lower half. If the median is one of the data points (when you have an odd number of values), it is not included in either half; but when you have an even number of data values, those middle two values are included (one in each half). Then the quartiles are the medians of each half.

Q2 is the median and splits the data into a lower half and an upper half. Q1 is the median of the lower half and Q3 is the median of the upper half. The big question is if you have an odd number of data points: do you include the median in the upper/lower halves or not? There’s no hard and fast rule so different programs would give you different Q1 and Q3 in these cases.

Since a median is best defined if you have an odd number of points, you could choose whether or not to include the median point in the upper and lower halves in order to give them an odd number of points. So with 11 points total, you’d exclude it to get 5 points in each half, but with 13 points total, you’d include it to get 7 points in each half.

The answer that there is no hard and fast rule at least assures me that I’m sane.

The problem I definitely ran into is that the program the student is using doesn’t have a hard and fast rule either. While it always excluded the median if it came from a single data point, if there were two data points averaged to make the median, sometimes it included the middle number of that half, and sometimes not, and sometimes it did both in the same problem.

I wrote a note for the student to take with him to his teacher. It’s a digital academy, and I suspect the software they’re using is fairly new and slightly buggy. The student indicated his teacher would be willing to deal with the discrepancy and make the call either way.

Thanks, guys!

What I always taught my students was to do it consistantly. Let’s say we have: 3, 6, 6, 9, 12, 14, 15. The median is 9 so I didn’t care if:
Q1= median of 3, 6, 6, 9; Q3= median of 9, 12, 14, 15
or
Q1= median of 3, 6, 6; Q3= median of 12, 14, 15
as long as they were consistant in including the median in both halves or neither and then consistant from problem to problem. You shouldn’t have this issue with an even number of datapoints.