Statistics: judging scores with one data point missing

Sigene · January 17, 2017, 3:29pm

I have a list of 15 applicants for an event with 5 judges. Each judge scores the applicants individually, and the scores will be added together for a final number. It turns out that one judge cannot provide a score for one of the applicants.

How do I best provide the most fair total score the missing datapoint. I could just average the other 4 judges, but since the judges score independently, there are individual differences between the judges (i.e. a great score for the one judge might be a 9, whereas a similarly great score from another judge might be a 6).

I’ve discovered ‘regression imputation’ that might be a possibility. Should I average the 4 judges scores for the applicants and plot them against the one judge with the missing data point, then use a regression line to calculate that data point?

looking for help, and justification…money is involved.

TimeWinder · January 17, 2017, 3:34pm

Ignore the missing judge for everybody.

up_the_junction · January 17, 2017, 3:36pm

How about multiplying the 14 judges combined score by 1/14th in order to compensate for the missing judge - you can’t replicate the missing judge subjectively so that methodology seems okay.

Sigene · January 17, 2017, 3:44pm

You mean multiply the remaining 4 judges scores by 1/4th (not sure what you mean by 14 judges…there are 15 applicants and 5 judges)? That is just the average of the remaining judges. I guess I don’t know what you mean by this.

septimus · January 17, 2017, 3:51pm

Do you need an actual score, or would rank order be good enough?

leahcim · January 17, 2017, 3:55pm

As a first cut, I would treat the “missed measurement” as the average of scores that judge gave for the other 14 applicants. That would avoid helping (if the judge was a typical low scorer) or hindering (if the judge was a typical high scorer) that applicant.

(Well, actually as a first cut I would do what TimeWinder suggests, but assuming that is not feasible…)

There is no end of how deep you can go with the modelling of this: Maybe we can establish from the other candidates that this judge’s score tends to be correlated with another one of the judge’s scores and infer a different value base on that, (see collaborative filtering) but the more complicated it gets, the more you open yourself up to accusations of unfairness.

up_the_junction · January 17, 2017, 4:02pm

Sorry, that’s exactly what I mean

mcgato · January 17, 2017, 4:30pm

–Find the overall average for each judge. For 4 judges, this will be the average of 15 applicants, while the other will be the average of 14 applicants.
–Convert the score for each applicant/judge combination to the difference in the applicant score from that judge’s average score. One applicant/judge combination will be missing, ignore it.
–Take the overall average of the converted score for each applicant. That is what you are comparing.

Example: Judge 1 has an average score of 6 out of 10 for the applicants. The score that judge 1 gave applicant 1 is a 7. The converted score to be used going forward is 1.

scr4 · January 17, 2017, 4:42pm

That was my first thought. But I think you’ll end up with the same (or very close) result if you do:

<J1-J4 score for A15> * (<J5 score for A1-A14> / <J1-J4 score for A1-A14>)

where J and A denote judges and applicants, J5 is the judge who could not provide a score for A15. <> denote averages. The first term is just the average score for A15 of the 4 judges. The rest is the measure of J5’s bias, whether he consistently scored the other 14 applicants higher/lower than other judges.

DrCube · January 17, 2017, 4:55pm

If one out of five judges can’t judge, then you only have four judges in fact. Adjust accordingly.

Chronos · January 17, 2017, 5:03pm

Where it gets really thorny is when you have a number of judges, but that no single judge is able to judge all of the contestants.

Sigene · January 17, 2017, 5:15pm

I think I like this.

leahcim · January 17, 2017, 5:19pm

Which, if you replace the words “judges” with “viewers”, “judge” with “submit ratings for”, and “contestants” with “movies”, becomes a problem that has been well-studied by Netflix in recent years.

Buck_Godot · January 17, 2017, 5:21pm

For a quick and easy analysis here is what I would do.

I assume that for most of the applicants you have all 5 judges.

For calculate values S1, S2,…,S5 to be the sum of each judges scores over all applicants for which you have complete data.

Assuming that for the given applicant you have values X1, X2, X3, X4 but are missing value X5.

For the 5th judge I would use the value

X5= S5*(X1+X2+X3+X4)/(S1+S2+S3+S4)

Basically the average weighted to take into account the fact that judge 5 may have particularly high or low scores.

It is possible that this imputed value may fall outside the range of possible value, in which case you should set it equal to the highest or lowest possible value.

There are other more complicated methods for imputing the missing data but those require a great deal of modeling. For your purposes I think this will be fine.

…

Or what scr said.

Blue_Blistering_Barnacle · January 17, 2017, 8:16pm

Policywise, I think you should present this idea to your ‘group’ (if you have one) for buy-in, acceptance, and voting prior to making the calculation and finding a result.

If this is an event which has any chance of recurring, you should get it in the procedures now, as well.

leahcim · January 17, 2017, 11:35pm

And, if there are multiple viable ways of doing the “correction”, do several of them and see whether they change the outcome in a material way. If they don’t, then the specific choice you make will be less controversial.

Blue_Blistering_Barnacle · January 18, 2017, 12:33am

Yeah, I’d agree. But I’m not sure- then you’d know which method yields which outcome. It’s great if the outcome is unchanged, but how do you “unring the bell” once you’ve done the calculations?

I guess you could make a spreadsheet to do the various methods and replace individual names with numbers so you don’t (easily) know who may be benefitting from your decision. That would work better if you had a trusted person to enter the data.

Isilder · January 18, 2017, 12:51am

No need to do all of them.

Do one of them and if that show the result could be important,
then you might do several to show that the result would be the same with any method.

If you still have a controversy then it goes to rules and precedents and what you need to do to avoid being lynched.

leahcim · January 18, 2017, 2:05am

Since there is only one applicant with a missing score, I think it is pretty clear who stands to win or lose based on the specific method chosen to fill in the data. Blinding the experiment at this point is probably not that useful.

It really depends on what you plan to do with the scores. If you are going to pick the best one, then in doesn’t matter if changing the resolution strategy moves that person from 12th to 8th place. If you are going to pick the top ten, it does.

Hell, maybe even if you assume the missing data point is 10/10 it doesn’t put that person into contention. Knowing that would make Sigene’s life a lot easier (and finding out it’s not true doesn’t make his life any harder).

septimus · January 18, 2017, 7:10am

For definiteness, let Wally be the judge who didn’t score an applicant named Fred.

It’s interesting to see what happens with various proposals in extreme cases. (Yes, the actual data aren’t so extreme, but it still might help understanding.)

Suppose all the judges give exactly the same scores for everyone, with Fred in first place (except from Wally), Bill a close second, and the others far behind. Leahcim’s policy will push Bill into 1st place.

Suppose the other judges all rank the candidates closely, but in a distinct order with Albert first and Fred last. Wally demonstrates his enthusiasm for Albert with a huge score (and of course doesn’t score Fred at all). Fred, despite having nothing but worst scores, can end up in 2nd place.

Thus my own proposal, which should avoid most charges of unfairness:

Rank the 15 applicants using all the scores except Wally’s. Call Fred’s rank R[sub]F[/sub] — this will be Fred’s final rank. Rank the other 14 applicants using all the judges’ scores; use these for all the other ranks R[sub]x[/sub]. (Of course each such rank with R[sub]x[/sub] ≥ R[sub]F[/sub] must be replaced with R[sub]x[/sub]+1 to make room for Fred.) Note that, except for Fred, the rankings from 15 judges might be completely opposite from the rankings from 14 judges!

If a numeric score is needed, the midpoint among the scores which preserve that R[sub]F[/sub] can be chosen. Left as an unsolved problem: Find an arithmetic procedure for achieving this R[sub]F[/sub] stability without explicitly using rank order.

Topic		Replies	Views
Statistical way to properly kickout an outlier Factual Questions	56	1506	June 3, 2024
Scoring an ordering question Factual Questions	16	1050	October 21, 2019
statistics question Factual Questions	35	2033	August 22, 2009
Filling in missing crossword data Factual Questions	4	1622	September 14, 2011
Comparing Internet Reviews More Accurately with Mathematics Factual Questions	35	438	March 20, 2025

Statistics: judging scores with one data point missing

Related topics