The situation is a bit complex but the question is simple – what is the proper method for calculating an average of percentages?
There are two possible methods of calculating the average deferral as a percentage of salaries:

add up the individual deferral percentages and divide by the number of cases

determine the average deferral and the average salary, then divide the average deferral by the average salary and then express the result as a percentage (you can get the same answer dividing total deferrals by total salaries since the number of cases in determining the two averages cancel out).
The second method is exactly what you would do if you wanted to calculate a team batting average (substituting “hits” for deferrals and “at bats” for salaries.")
Somewhere in the process of getting three college degrees, I learned (or thought that I learned) that the second method is the only proper method of calculating averages involving percentages. I’ve now reviewed 18 statistics books and a dozen math texts and have not found any mention that averaging percentages require any special handling. I didn’t find anything saying that the second method was wrong – it simply wasn’t mentioned. My internet searches gave hits that supported method 2 but I didn’t find anything definitive.
The problem with the first method is that it gives distorted results, particularly when the number of cases is small. For example, if a student gets 2 out of 10 on a pop quiz and then 100 out of 100 on the final, what’s his average percentage and does he pass the course?
Method 1: ( 20% + 100% ) / 2 = 60% = “F” failure
Method 2: ((2+100) / 2 ) / ( (10+100)/2) = 102 / 110 = roughly 90% = “A” pass
Or, consider a baseball team. For simplicity, we’ll say that there are only 3 players, and each is getting 30 hits per 100 at bats. In this case, the team batting average is 30% regardless of method. (I’ll spare you the calculations).
Now, let’s consider two small changes. A player is called up from the minors, goes to the plate, gets a hit and then is injured and sent down.
Scenario 1: just one an additional at bat:
Method 1: (30%+30%+30%+100%)/4 = 190%/4 = 47.5%
Method 2: (30+30+30+1) / 301 = 30.2%
Scenario 2: new player substitutes for regular player (and gets a hit)
Method 1: (30%+30%+ (29/99) + 100%)/4 = 47.3%
Method 2: (30+30+29+1) / (100+100+99+1) = 90/300 = 30%
In scenario 2, method 2 shows that there was no change (same # of hits, same # at bats, no change) but method 1 reports the new team batting average as 47.3%.
Unfortunately, this is not simply an academic exercise for me. The tax law sets a limit on deferrals to 401(K) plans for highly compensated employees (HCE) based on the average deferral percentage (ADP) of nonhighly compensated employees (NHCE) but does not specify how these averages are to be calculated. The IRS regulations say that method 1 is to be used. I think that this mathematically wrong, that method 2 should be used.
The odd thing is – since the IRS says that firms have to use method 1, it becomes trivially easy to defeat the intent of the law. For a small firm, one low paid parttimer with a high contribution rate can bump up the apparent NHCE Average Deferral Percentage so high that the ADP limit is meaningless. (i.e. 90% deferral percentage, 20 employees, no other deferrals, method 1 yields 4.5% ADP ((90%+0+0…) / 20 =4.5%) Using method 2, this additional employee would mean virtually no change in the ADP.
So, what is the correct approach to calculating an average of percentages?
method 1 (sum %s/N)
method 2 ( average numerator / average denominator ) or its equivalent
( total numerator / total denominator )
and can you give any citations to back up your opinion?
Thanks,
davet