I have seen this both ways so I’m wondering if there is a technically correct answer and if not, what the consensus is.
Assume an odd amount of data that you want to split into the low half and high half. Does the median go with each half or neither half? In other words for the data set 1 2 3 4 5 6 7 which is correct?
Low: 1 2 3 4 High: 4 5 6 7
Low: 1 2 3 High: 5 6 7
FWIW I think including it in each half is the correct way.
If anything, match the definition of Q1 and Q3 to Google Sheets for my students which matches my understanding of what happens to a median when sets are split. I just want that confirmed or if 90% of the math Dopers here say, “OMG! You and Google Sheets are completely wrong!” I want to know that as well.
I don’t think there is a ‘technically correct’ answer here.
Define your set split as you prefer or whatever works best for your purpose and that becomes the correct answer. Just make sure your definition is unambiguous.
For finance, my understanding is that including the median in both sets can be a good idea but that’s not how every piece of software will necessarily work.
I’m trying to think of ready examples where a median could be split. I did examples in statistics class in college but can’t recall one like that. I was a very attentive student too. As the last person said, finance software probably wouldn’t allow for it but I’m trying to think of other examples.
I’d argue that you can’t split an odd number of items into halves. Nor can you assign discrete items into more than one half.
Try looking at the problem differently. An odd number of discrete items can be put into equal below median and above median sets. This has the advantage of being generalizable. It works for 1 2 3 4 4 4 4 4 5 6 7 and 1 2 3 39 65 66 67.
Not if you mean “below” and “above” strictly. If your set is, say, 1 2 2 2 3 4, then the median is 2, the “below median” is 1, and the “above median” is 3 4.
Include half of the median element in each. That is, if we have the set [1 2 3 4 5 6 7], then the lower half has weighting factors of [1 1 1 0.5 0 0 0] and the upper half has [0 0 0 0.5 1 1 1].
IANA statistician, not by a long shot. So huge salt farm–sized blocks of salt should be applied here.
I’ll suggest that beyond definitional of what “the (uppercase m) Median” means, I suspect the real right answer to the OP’s question depends on why he’s trying to split a set along the median “axis” into upper and lower halfs. What then? What do you do with those sets?
@Dr.Strangelove just above has an appealing notion. If your’re going to somehow be combining the two halves later and don’t want to double count that distinct single median entry. But if the future of the two halves is entirely separate, that really makes a hash of a question like “What is the arithmetic mean of the lower upper half? The upper half?”
1 2 2 2 3 4 is an even number of objects. If you add a 2 to make 1 2 2 2 2 3 4 you can divide it into three below medium terms and three above median objects. True, some of them are equal to the median and each other, and that wouldn’t be a preferred way to describe them, but they’re still equal sets of discrete items and better than having the same item in two halves.
On the contrary; it makes it easier to solve. It’s just that you have 3.5 elements in each.
Just consider it like an integral of a function where each of the regions between some integer n and n+1 are constant. If the total number of regions is odd, then we have to slice the middle region in half, but that’s not any different than if the slice happens between two regions. An arithmetic average is just \frac{\int_a^b{f(x) dx}}{b-a} no matter whether a or b are integers or otherwise.
ETA: Looks like you edited your reply, but that doesn’t change the answer. We can take the integral between 0 and 1.75 if we wish, or 1.75 and 3.5, etc.
Thank you. Makes sense explained that way. As I said, I’m probably stupid for even posting in stats threads.
Yes, I edited my reply. I realised that my “/” might be mis-interpreted as signifying division where I meant it as an alternative substitution of words in the sentence. Like “[lower | upper]” in BNF.
To be honest, I’ve no idea if anyone actually does this. It’s how I’d do it if it were really important to keep track of things carefully, but it really only matters if you have a small number of elements and can’t depend on the average smoothing things out.
My gut reaction is that if where you place a single data point makes a significant difference in your final outcome, either you have too few data or too fragile an algorithm. The solutions are to get more data or to devise a more robust algorithm.
I’ve thought about the extreme case of 3 data points. If the median does not go to each half then Q1 is the smallest value and Q3 is the largest value.