# calculating averages of percentages

The situation is a bit complex but the question is simple – what is the proper method for calculating an average of percentages?

There are two possible methods of calculating the average deferral as a percentage of salaries:

1. add up the individual deferral percentages and divide by the number of cases

2. determine the average deferral and the average salary, then divide the average deferral by the average salary and then express the result as a percentage (you can get the same answer dividing total deferrals by total salaries since the number of cases in determining the two averages cancel out).

The second method is exactly what you would do if you wanted to calculate a team batting average (substituting “hits” for deferrals and “at bats” for salaries.")

Somewhere in the process of getting three college degrees, I learned (or thought that I learned) that the second method is the only proper method of calculating averages involving percentages. I’ve now reviewed 18 statistics books and a dozen math texts and have not found any mention that averaging percentages require any special handling. I didn’t find anything saying that the second method was wrong – it simply wasn’t mentioned. My internet searches gave hits that supported method 2 but I didn’t find anything definitive.

The problem with the first method is that it gives distorted results, particularly when the number of cases is small. For example, if a student gets 2 out of 10 on a pop quiz and then 100 out of 100 on the final, what’s his average percentage and does he pass the course?

Method 1: ( 20% + 100% ) / 2 = 60% = “F” failure

Method 2: ((2+100) / 2 ) / ( (10+100)/2) = 102 / 110 = roughly 90% = “A” pass

Or, consider a baseball team. For simplicity, we’ll say that there are only 3 players, and each is getting 30 hits per 100 at bats. In this case, the team batting average is 30% regardless of method. (I’ll spare you the calculations).

Now, let’s consider two small changes. A player is called up from the minors, goes to the plate, gets a hit and then is injured and sent down.

Scenario 1: just one an additional at bat:

Method 1: (30%+30%+30%+100%)/4 = 190%/4 = 47.5%
Method 2: (30+30+30+1) / 301 = 30.2%

Scenario 2: new player substitutes for regular player (and gets a hit)

Method 1: (30%+30%+ (29/99) + 100%)/4 = 47.3%
Method 2: (30+30+29+1) / (100+100+99+1) = 90/300 = 30%

In scenario 2, method 2 shows that there was no change (same # of hits, same # at bats, no change) but method 1 reports the new team batting average as 47.3%.

Unfortunately, this is not simply an academic exercise for me. The tax law sets a limit on deferrals to 401(K) plans for highly compensated employees (HCE) based on the average deferral percentage (ADP) of non-highly compensated employees (NHCE) but does not specify how these averages are to be calculated. The IRS regulations say that method 1 is to be used. I think that this mathematically wrong, that method 2 should be used.

The odd thing is – since the IRS says that firms have to use method 1, it becomes trivially easy to defeat the intent of the law. For a small firm, one low paid part-timer with a high contribution rate can bump up the apparent NHCE Average Deferral Percentage so high that the ADP limit is meaningless. (i.e. 90% deferral percentage, 20 employees, no other deferrals, method 1 yields 4.5% ADP ((90%+0+0…) / 20 =4.5%) Using method 2, this additional employee would mean virtually no change in the ADP.

So, what is the correct approach to calculating an average of percentages?

method 1 (sum %s/N)

method 2 ( average numerator / average denominator ) or its equivalent
( total numerator / total denominator )

and can you give any citations to back up your opinion?

Thanks,

davet

It depends on what you want to express. Method one answers the question “For this next case coming up, what can I expect the deferral percentage to be?” Method two answers the question “Overall, how many dollars, per 100, get deferred?”

So you can say that neither is correct. Or both. It just depends on what you want your answer to mean.

Technically speaking, averaging %s isn’t any different than averaging any other number. So technically speaking, the first method (%+%+%+%/4) is correct. It’s just that that method rarely answers the question at hand.

ETA: An example…
What’s the average of 20% and 100%? It is, and always has been, 60%. That will never change. It’s a fundamental law of the universe. All of math would collapse if it weren’t so. But that 60% is usually meaningless in the context of your problem. But that’s not a flaw of the math, it’s a flaw of your question. You usually don’t want to know what the average of the numbers is in the first place-you usually want a percentage of the overall picture, which is a completely different question, and requires a completely different calculation.

Method 1 yields an answer to “What is the average test score?” Method 2 yields an answer to “What percentage of possible points has the student earned?” It’s not that Method 1 is wrong- it’s just not useful. You want to know the second answer, not the first.

The question you need to ask yourself is “Should the salary of the employee be taken into account?” It seems like the IRS wants to for HCEs to be like normal people. If Bob puts away 5% of his earnings and Jim puts away 10% of his earnings, should it matter how much they make (assuming they’re NHCEs)? I’d say it shouldn’t matter. 7.5% is the average percentage that NHCEs defer from their pay. That should be the cap for HCEs.

But as you said earlier, it depends what you want the answer to mean.
If we assume that every test question covers a unique topic, method 2 gives the better representation of the student’s knowledge. He successfully answered 102 out of 110 questions.
If we assume however that each test covered a different topic, then method 1 gives a better representation of the student’s knowledge. He didn’t learn the topic of test 1, but did learn the topic of test 2. The fact that the second test had more questions does not necessarily mean that the score of the test should carry more weight.

Yeah, I’d agree with that.