A and B produce an estimated value of some quantity. Each is estimating a separate value.
A’s estimate (EA): 35
B’s estimate (EB): 2980
It turns out the actual values were:
A’s actual value (AA): 42
B’s actual value (AB): 2541
Who is the better estimator?
Using the classic percentage difference calculation: P = “|(actual - estimated)| / actual”
A’s %diff (PA): |42 - 35| ÷ 42 = 16.7%
B’s %diff (PB): |2541 - 2980| ÷ 2541 = 17.3%
The goal is to provide some objective clue* as to who is the better estimator. By this measure, A and B are equally skilled at estimation - each is off by about 17%. The smaller the P the “better” the estimator, so it could even be said A is slightly better. This isn’t a fair comparison, though, if we know B’s job was tougher in some other way.
There are three refinements I want to add to the calculation above. Each of these should reduce P by some amount. Each reduction will likely each contain some “F-factor” to allow tweaking and tailoring.
- Scale (P’s first refinement: PR[sub]1[/sub])
However, let’s assume the difficult in estimating scales with the value’s magnitude - the bigger it is the harder it is to estimate. In that case, Pat did way better with EB.
How can I modify the formula to reflect this? Something like PR[sub]1[/sub] = P * (1 - S(A) ). where S(A) = some function of A that starts at zero and horizontally asymptotes to 1 as A increases without bound.
It’s been too long since I’ve done this kind of math. Using Wolfram Alpha I landed on a formula for a horizontal asymptote with a “magic number” that can be adjusted to make the formula fairer by moving the “knee of the curve” left or right (which can be set based on discussion, historical analysis, trial/error).
First attempt: S(A) = A / sqrt(A[sup]2[/sup] + F) where F moves the “knee” left or right. (with F = 100 million, A would get a near zero reduction and B would get 25%, aka PR[sub]1[/sub]B = 13%)
Is there a better approach?
2) 0-n complexifying factors (PR[sub]2[/sub])
There could be any number of independent factors that make one estimation tougher than another. Let’s assess each of these factors on a scale (1-5 or 1-10). Some factors may apply to any estimate. Some factors may not apply.
So if A’s value had one complexifying factors and B’s value had two. Each was assessed on a scale of 1-5. So we’re looking at something like this:
PR[sub]2[/sub]A = PA * (1 - C[sub]1/sub )
PR[sub]2[/sub]B = PB * (1 - C[sub]1/sub ) * (1 - C[sub]2/sub )
Is it just a simple matter of defining C[sub]x[/sub] = assessment / scale? Is that fair? I guess it should be assessment / (scale + F) where F > 0 to ensure PR is not unconditionally reduced to 0%
- It’s better to over estimate (PR[sub]3[/sub])
I’m not even sure where to begin on this, but the fact of the matter is that all other things being equal the estimate larger than the actual is the better estimate. Ordinarily I would say it’s not ideal to reduce P to reflect this, rather take out the absolute value and leave in the sign to show which is better. But a number of PRs will be averaged to assess a final “score” if you will.
So perhaps: PR[sub]3[/sub] = P * (1 - if(P >0, F, 0) )
Where F is some value between 0 and 1 (say .1 to start)
Maybe a better measure is a separate metric based on count over’s vs. count unders.
The GQs are:
Is the math sound? How can it be improved? Is the approach sound?
- I use the word “clue” to acknowlege no formulaic model can definitively declare who is the better estimator, but it does give a worthwhile contribution to an overall comparison.