I have a list of 15 applicants for an event with 5 judges. Each judge scores the applicants individually, and the scores will be added together for a final number. It turns out that one judge cannot provide a score for one of the applicants.
How do I best provide the most fair total score the missing datapoint. I could just average the other 4 judges, but since the judges score independently, there are individual differences between the judges (i.e. a great score for the one judge might be a 9, whereas a similarly great score from another judge might be a 6).
I’ve discovered ‘regression imputation’ that might be a possibility. Should I average the 4 judges scores for the applicants and plot them against the one judge with the missing data point, then use a regression line to calculate that data point?
looking for help, and justification…money is involved.
How about multiplying the 14 judges combined score by 1/14th in order to compensate for the missing judge - you can’t replicate the missing judge subjectively so that methodology seems okay.
You mean multiply the remaining 4 judges scores by 1/4th (not sure what you mean by 14 judges…there are 15 applicants and 5 judges)? That is just the average of the remaining judges. I guess I don’t know what you mean by this.
As a first cut, I would treat the “missed measurement” as the average of scores that judge gave for the other 14 applicants. That would avoid helping (if the judge was a typical low scorer) or hindering (if the judge was a typical high scorer) that applicant.
(Well, actually as a first cut I would do what TimeWinder suggests, but assuming that is not feasible…)
There is no end of how deep you can go with the modelling of this: Maybe we can establish from the other candidates that this judge’s score tends to be correlated with another one of the judge’s scores and infer a different value base on that, (see collaborative filtering) but the more complicated it gets, the more you open yourself up to accusations of unfairness.
–Find the overall average for each judge. For 4 judges, this will be the average of 15 applicants, while the other will be the average of 14 applicants.
–Convert the score for each applicant/judge combination to the difference in the applicant score from that judge’s average score. One applicant/judge combination will be missing, ignore it.
–Take the overall average of the converted score for each applicant. That is what you are comparing.
Example: Judge 1 has an average score of 6 out of 10 for the applicants. The score that judge 1 gave applicant 1 is a 7. The converted score to be used going forward is 1.
That was my first thought. But I think you’ll end up with the same (or very close) result if you do:
<J1-J4 score for A15> * (<J5 score for A1-A14> / <J1-J4 score for A1-A14>)
where J and A denote judges and applicants, J5 is the judge who could not provide a score for A15. <> denote averages. The first term is just the average score for A15 of the 4 judges. The rest is the measure of J5’s bias, whether he consistently scored the other 14 applicants higher/lower than other judges.
Which, if you replace the words “judges” with “viewers”, “judge” with “submit ratings for”, and “contestants” with “movies”, becomes a problem that has been well-studied by Netflix in recent years.
For a quick and easy analysis here is what I would do.
I assume that for most of the applicants you have all 5 judges.
For calculate values S1, S2,…,S5 to be the sum of each judges scores over all applicants for which you have complete data.
Assuming that for the given applicant you have values X1, X2, X3, X4 but are missing value X5.
For the 5th judge I would use the value
X5= S5*(X1+X2+X3+X4)/(S1+S2+S3+S4)
Basically the average weighted to take into account the fact that judge 5 may have particularly high or low scores.
It is possible that this imputed value may fall outside the range of possible value, in which case you should set it equal to the highest or lowest possible value.
There are other more complicated methods for imputing the missing data but those require a great deal of modeling. For your purposes I think this will be fine.
Policywise, I think you should present this idea to your ‘group’ (if you have one) for buy-in, acceptance, and voting prior to making the calculation and finding a result.
If this is an event which has any chance of recurring, you should get it in the procedures now, as well.
And, if there are multiple viable ways of doing the “correction”, do several of them and see whether they change the outcome in a material way. If they don’t, then the specific choice you make will be less controversial.
Yeah, I’d agree. But I’m not sure- then you’d know which method yields which outcome. It’s great if the outcome is unchanged, but how do you “unring the bell” once you’ve done the calculations?
I guess you could make a spreadsheet to do the various methods and replace individual names with numbers so you don’t (easily) know who may be benefitting from your decision. That would work better if you had a trusted person to enter the data.
Since there is only one applicant with a missing score, I think it is pretty clear who stands to win or lose based on the specific method chosen to fill in the data. Blinding the experiment at this point is probably not that useful.
It really depends on what you plan to do with the scores. If you are going to pick the best one, then in doesn’t matter if changing the resolution strategy moves that person from 12th to 8th place. If you are going to pick the top ten, it does.
Hell, maybe even if you assume the missing data point is 10/10 it doesn’t put that person into contention. Knowing that would make Sigene’s life a lot easier (and finding out it’s not true doesn’t make his life any harder).
For definiteness, let Wally be the judge who didn’t score an applicant named Fred.
It’s interesting to see what happens with various proposals in extreme cases. (Yes, the actual data aren’t so extreme, but it still might help understanding.)
Suppose all the judges give exactly the same scores for everyone, with Fred in first place (except from Wally), Bill a close second, and the others far behind. Leahcim’s policy will push Bill into 1st place.
Suppose the other judges all rank the candidates closely, but in a distinct order with Albert first and Fred last. Wally demonstrates his enthusiasm for Albert with a huge score (and of course doesn’t score Fred at all). Fred, despite having nothing but worst scores, can end up in 2nd place.
Thus my own proposal, which should avoid most charges of unfairness:
Rank the 15 applicants using all the scores except Wally’s. Call Fred’s rank R[sub]F[/sub] — this will be Fred’s final rank. Rank the other 14 applicants using all the judges’ scores; use these for all the other ranks R[sub]x[/sub]. (Of course each such rank with R[sub]x[/sub] ≥ R[sub]F[/sub] must be replaced with R[sub]x[/sub]+1 to make room for Fred.) Note that, except for Fred, the rankings from 15 judges might be completely opposite from the rankings from 14 judges!
If a numeric score is needed, the midpoint among the scores which preserve that R[sub]F[/sub] can be chosen. Left as an unsolved problem: Find an arithmetic procedure for achieving this R[sub]F[/sub] stability without explicitly using rank order.