Why do statisticians care more about bias than mean square error?

Why do statisticians care more about bias than mean square error?

Both concepts are covered when estimator criteria are introduced, but lack of bias seems to be the more sought-after characteristic. (Specifically, the “minimum variance unbiased estimator” seems to be popular).

To me, minimizing the error of the estimate seems more relevant than whether umpteen data sets will on average give you the correct answer. But that’s just me.

Is bias emphasized for reasons that are not immediately apparent? Or is mean square error de-emphasized because it is difficult to calculate in a general context? Or is there something else that I am missing?

As I understand the concept of bias, an estimate that has a high variance is not very reliable, but an estimate that is biased is just plain wrong.

I’m afraid that I don’t know how to include the neat-looking web references, but the following one gives a very good example of why you want to avoid biased estimates:

http://mathforum.org/library/drmath/view/52807.html

The specific example given there is four people making estimates of how much food a bear eats: if each person performs his/her analysis in a different season, the person making his/her analysis in the fall will conclude that the bear eats an awful lot (when he’s generating his body fat in preparation for hibernation), while the person making his/her analysis in the winter will conclude that the bear doesn’t eat anything at all (when all the bear is doing is sleeping). Note that in both cases the variance of the estimate will be low, but the estimate will be biased.

Here’s another fine article describing why you want to avoid bias:

http://members.fortunecity.com/jonhays/stories.htm

It includes a description of the famous poll taken by *
Literary Digest* magazine that predicted that Al Landon was going to crush FDR (the actual landslide was in the opposite direction). It turned out that the magazine folks used telephone books and car registration lists to get the people they polled - which in those days meant they basically only polled well-off Republicans and left out FDR’s main political base.

It also describes a “does seeding rain clouds produce rain” study that took much of its data from reports of companies which seeded rain clouds for a living. Guess what the conclusion was?

IANAS (I am not a statistician) but I do know that the preceding two posts are confusing two uses of the term “bias” in statistics. See, for example, this wikipedia entry for an explanation. I believe that the OP was asking about unbiased estimators, not the problem of biased samples. Having said that, I don’t really have an answer for the OP, but poking around on the web I find various references to the fact that maximum likelihood estimators are generally preferred over unbiased estimators when they differ. On the other hand, common practice is still to use the unbiased sample variance and sample standard deviation (the ones with n-1 in the denominator) rather than the more natural-looking maximum likelihood estimator, which is slightly biased. I don’t know why.

If you know the theoretical bias, you can correct for it, which makes a biased estimator a little easier to deal with than a high-variance estimator, IMO. But I’m not a professional statistician, so we’ll wait for someone else to come along.

Bias introduces error that cannot be eliminated by averaging or anything else. Your mean-square-error will be wrong (or unprovable) if the underlying statistic is biased. Therefore all the probability estimates (F-values, p-values) will be meaningless.

(I think there is an unintentional pun there somewhere).

Thanks to everyone for their replies.

Topologist is indeed correct: I had in mind bias as it relates to estimators, not samples. Still, WillGolfForFood is correct in pointing out that biased samples are in general A Bad Thing and at the very least call for some sort of reweighting.

Some points

-> Just to make it clearer, the Mean Square Error criterion reflects both the bias and variance of a given estimator: it reflects the average squared error (relative to the true value) of a given estimator.

-> I am not alone in having problems with bias as a criterion for what makes a good estimator. Kennedy (1998) quotes Savage (1954), “A serious reason to prefer unbiased estimates seems never to have been proposed”. Kennedy continues, “None the less (sic), unbiasedness has enjoyed remarkable popularity among practitioners. Part of the reason for this may be due to the emotive content of the terminology: who can stand up in public and state that they prefer biased estimators?”.

-> More Kennedy: “The main objection to the unbiasedness criterion is summarized nicely by the story of the 3 econometricians who go duck hunting. The first shoots about a foot in front of the duck, the second about a foot behind; the third yells, ‘We got him!’”

-> I don’t know whether Maximum Likelihood invariably has lower Mean Square Errors in general. (Really! I lack knowledge.)


Still, I’m not posing a GD; rather I’m asking what the justification for the unbiasedness criterion is, relative to MSE. After posting, I thought of a possible explanation which may or may not be confused. [1]

A given estimating approach will produce a range of mean square errors depending upon the sample size or particular model that is being estimated. In contrast, an unbiased estimator will always be unbiased.

Now, if a given estimator gave the minimum mean square error under all circumstances, that of course would be a compelling attribute. But that criteria seems a little stringent.

Alternatively, we could ask whether an estimator had on average a lower MSE, relative to another estimator. But I speculate that such a calculation may be difficult.

Tentative WAG: It’s easier to make categorical statements about bias (or consistency for that matter) than it is about MSE.


[1] [sub]Actually, I read this in Harvey (1990), but fear that I may have completely misunderstood the point the author was making. So this isn’t really a cite.[/sub]