Type I vs Type II error: can someone dumb this down for me

…once and for all? I have studied it a million times and still can’t wrap my head around the theories or the language (eg null). Any real life example would be appreciated greatly.


Here’s my take, as someone who is not a statistician but who has taught basic statistics.

A Type I error is rejecting the null hypothesis if it’s true (and therefore shouldn’t be rejected). It’s sometimes likened to a criminal suspect who is truly innocent being found guilty. You’re saying there is something going on (a difference, an effect), when there really isn’t one (in the general population), and the only reason you think there’s a difference in the general population is because, thanks to random variations, your sample was different enough from the population to be misleading.

A Type II error is failing to reject the null hypothesis if it’s false (and therefore should be rejected). It’s likened to a criminal suspect who is truly guilty being found not guilty (not because his innocence has been proven, but because there isn’t enough evidence to convict him).

This site explains it this way: “Another way to look at Type I vs. Type II errors is that a Type I error is the probability of overreacting and a Type II error is the probability of under reacting.” (I would have said that the Type I and Type II errors are the overreacting and underreacting themselves, and that the probability of doing so is symbolized by the Greek letters alpha and beta, respectively.)
An example: You’ve developed a new auto fuel additive that you claim increases a car’s gas mileage. You test it on a random sample of cars under a random sample of driving conditions and find that the cars you tested did get somewhat better gas mileage than normal. This result can mean one of two things:

(1) The fuel additive doesn’t really make a difference, and the better mileage you observed in your sample is due to “sampling error” (i.e. because of other factors, the mileage tests in your sample just happened to come out higher than average). If you could test all cars under all conditions, you wouldn’t see any difference in average mileage at all in the cars with the additive. This would be the null hypothesis.

(2) The difference you’re seeing is a reflection of the fact that the additive really does increase gas mileage. If you could test all cars under all conditions, you would see an increase in mileage in the cars with the fuel additive. This would be the alternative hypothesis.

A Type I error occurs if you decide it’s #2 (reject the null hypothesis) when it’s really #1: you conclude, based on your test, that the additive makes a difference, when it really doesn’t.

A Type II error occurs if you decide that you haven’t ruled out #1 (fail to reject the null hypothesis), even though it is in fact true. You conclude, based on your test, either that it doesn’t make a difference, or maybe it does, but you didn’t see enough of a difference in the sample you tested that you’re willing to say there’s a difference in general.


A Type I error is also known as a false positive. In other words you make the mistake of assuming there is a functional relationship between your variables when there actually isn’t. For example, you are researching a new cancer drug and you come to the conclusion that it was your drug that caused the patients’ remission when actually the drug wasn’t effective at all.

A Type II error is the opposite: concluding that there was no functional relationship between your variables when actually there was. In this case, you conclude that your cancer drug is not effective, when in fact it is.

Both Type I and Type II errors are caused by failing to sufficiently control for confounding variables. Example: you make a Type I error in concluding that your cancer drug was effective, when in fact it was the massive doses of aloe vera that some of your patients were taking that caused the remission.

How about Larry Gonick’s take (paraphrased from his Cartoon Guide to Statistics):

Type 1 error: Alarm with no fire. (false positive)
Type 2 error: Fire with no alarm. (false negative)
Is that “dumbed-down” enough? :smiley:

Think of “no fire” as “no correlation between your variables”, or null hypothesis. (nothing happening)

Think of “fire” as the opposite, true correlation, and you want to reject the null hypothesis (because there really is something going on).

And “alarm” is evidence of correlation. So you WANT to have an alarm when the house is on fire…because you WANT to have evidence of correlation when correlation really exists.

So how’d I do, statistics guys? Hope I didn’t foul those up and mess up the OP even further.
(simple bonehead error)

They’re not only caused by failing to control for variables. Sometimes, it’s just plain luck. If 10% of cancer goes into remission without treatment (made up statistic there), then you expect 2/20 patients to get better regardless of the medication. But there is a non-zero chance that 5/20, 10/20 or even 20/20 get better, providing a false positive. Or 0/20, giving you the false negative.

The bigger the sample and the more repetitions, the less likely dumb luck is and the more likely it’s a failure of control, but we don’t always have the luxury of large samples.

I bring this up not just to pick nits, but because it was my key for understanding it. Sampling introduces a risk all of its own, and we can use proper logical and mathematical techniques to reach incorrect conclusions if the random sampling has produced a non-representative selection. Statistical analysis can never say “This is absolutely, 100% true.” All you can do is bet the smart odds (usually 95% or 99% certainty) and know that you’re occasionally making errors even though you did everything right.

OK, here is a question then: why do people insist on using the completely opaque, periphrastic, and easily confusable terms type I error and type II error, when the relatively transparent, succinct and distinct expressions false negative and false positive are readily available alternative?

I opened this thread because, although I am sure I have been told before, I could not recall what type I and type II errors were, but I know perfectly well what is meant by false negative and false positive.

Because intro stats books still use the old terms. This is slowly changing, but it’s gonna be a while before the new terminology is standard.

An easy way for me to remember it is one up, two down.

A type 1 error is when you make an error while giving a thumbs up.

A type 2 error is when you make an error doing the opposite.

Somewhat related xkcd comic.

I’ve heard it as “damned if you do, damned if you don’t.” Type I error can be made if you do reject the null hypothesis. Type II error can be made if you do not reject the null hypothesis.

Thank you all, so so much–I can’t thank you enough. For the first time ever, I get it! And not just in theory; I see it in real life situations so it makes that much more sense.

This is as good as it gets in an Internet forum! :slight_smile:

I opened this thread to make the same complaint. Descriptive labels are so much more useful.

In my area of work, we use “probability of detection” (the complement of “false negative”) and “probability of false alarm” (equivalent to “false positive”).

In some fields the terms false alarm and missed detection are commonly used for type I and type II errors, respectively.

I find it easy to think about hypothesis prediction in terms of guilt or innocence in a court case.

The Null hypothesis is the baseline assumption of what we would say if there was no evidence. In the court we assume innocence until proven guilty, so in a court case innocence is the Null hypothesis.

Type 1 error is the error of convicting an innocent person.
Type 2 error is the error of letting a guilty person go free.

Since we are most concerned about making sure we don’t convict the innocent we set the bar pretty high. In practice this is done by limiting the allowable type 1 error to less than 0.05. Or in other-words saying that it the person was really innocent there was only a 5% chance that he would appear this guilty. In real court cases we set the p-value much lower (beyond a reasonable doubt), with the result that we hopefully have a p-value much lower than 0.05, but unfortunately have a fairly high Type 2 error rate resulting in many crimes going unpunished.

Because Type I and Type II errors are asymmetric in a way that false positive / false negative fails to capture.

In Type I errors, the evidence points strongly toward the alternative hypothesis, but the evidence is wrong. Perhaps the test was a freakish outlier, or perhaps there was some outside factor we failed to consider.

In Type II errors, the evidence doesn’t necessarily point toward the null hypothesis; indeed, it may point strongly toward the alternative–but it doesn’t point strongly enough. We fail to reject because of insufficient proof, not because of a misleading result.

A lay person hearing false positive / false negative is likely to think they are two sides of the same coin–either way, those dopey experimenters got it wrong. Whereas in reality they are two very different types of errors.

In the past I’ve used the example of a court trial. The null hypothesis (at least in the US) is innocence of the accused; that’s the initial assumption. A Type 1 error would be incorrectly convicting an innocent person. Type 2 would be letting a guilty person go free.

I’m not a lay person, but the “type I” and “type II” terms make it easier to conflate them, not harder. Whats the difference? A “one” or a “two”; seems pretty much the same.

While everyone knows that “positive” and “negative” are opposites. So a “false positive” and a “false negative” are obviously opposite types of errors.