Is Snopes wrong? (Statistical Significance)

Snopes addresses the Australian gun buyback program here: Australian Guns. In it they make a statement that doesn’t seem right to me:

My statistics is pretty rusty, but my gut is telling me that this is incorrect. The sample size is 4.5 million, so a 171% increase of something should be statistically significant. I think.

Remind me how this works.

It is because you are comparing the homicides against the background population. In 1996 0.000001 percent of the population was affected. In 1997 it was 0.000004 percent. The difference between these results is statistically insignificant.

The trouble comes in comparing very rare instances of an event within a relatively large population. The numbers compared to each other seem to show a great difference but when you compare those numbers to the total population you can see how statistical noise within the population can drive those numbers. So, 7 and 19 by themselves seem to show a large difference but when you factor that they occur within a population of 4.5 million you can see how the difference is, proportionally, quite small.

(note:I’m not a statistics geek so I hope I got that right)

I think it’s because of the small number of firearm-related homicides compared to the overall population. As an extreme example, if you have 1 firearm-related homicide in one year and then 3 the next, you have a 200% increase, but I don’t think it would be considered to be statistically significant. The numbers are too small to be able to reach any conclusions.

A standard statistical test such as the t-test has the number of samples in the denominator (standard error or sd/sqrt(n) ). If the number of samples tested increase without an increase in the effect (difference in death rate between two years) or decrease in standard deviation, the resulting test statistic gets smaller.

To give a bound on significance, the Poisson Distribution would work as a model. If it’s not significant under Poisson, it’s almost certainly not significant, because I think any deviations from Poisson would almost certainly involve clumping - e.g. more than one homicide per incident, copycat killings.

If you assume a Poisson distribution with a mean of 7 deaths, then the probability of 17 or more deaths in a year is 0.096% which would generally be considered significant.

If you assume a mean of 19 deaths then the probability of 7 or fewer is 0.543% also generally considered significant.

If you assume a mean of 13 (the average of 7 and 19) then you’d have 7 or fewer 5.4% of the time and 17 or more 16.45% of the time. Neither of those is particularly significant at standard levels.

So it all depends on how you look at it.

Hmmm… Just to be clear, this is a question about pure, plug-numbers-into-a-formula, statistical significance. Forget that it’s about homicides. In fact you could restate the question like this:

Control Group: Size = 4.5 million; number of hits = 7
Test Group: Size = 4.5 million; number of hits = 19

Is it statistically different?

To put it in layman’s terms, statistical significance doesn’t mean the same thing as the common word significant. That is a common misunderstanding. All statistical significance tests for is whether differences between different sets of results are due to chance alone at some established threshold (i.e. 95% or 97% sure that they aren’t due to chance alone). That is difficult to achieve with a very tiny effect among a very large population. In this case, just a few double murders in the same year could increase the the raw percentages greatly by chance without indicating any overall trend.

When you referenced a 171% increase in gun related deaths, that is not what statistical significance is about. That is called an effect size and it is a separate measure. First you have to figure out whether the effect is due to chance at some level of confidence (statistical significance) and then you express how big that difference is with other tests (effect size).

This concept can easily go the other way as well and often does. With very large samples, you can also show a statistically significant effect that is trivial in percentage terms. For example, you could run a study with 100,000 people that were in matched groups that either took Vitamin Q or did not. You may find a statistically significant result that the people that took Vitamin Q have a statistically significant lower risk of heart attacks. The media would be all over that just like they are with many such stories. However, the effect size may be only a 1% difference over a lifetime and therefore meaningless to the vast majority of people. That type of thing really does happen all the time in studies. Statistical significance is a key test to determine if differences really exist but it doesn’t mean that those differences are truly significant in the common usage of the term.

No, in order to do statistics, you have to choose a model, and you have to use your knowledge of the phenomenon under study to asses whether the assumptions of the model are met. (There are “model-free” statistical methods, but not applicable to this).

To take a trivial example, without knowledge of the phenomenon, how do you know the observed value cannot fall below zero?

No its not. I can run an ANOVA (Analysis of Variance) or T-test if you want but I don’t think it is necessary in this case because it is so far off.

Before you run one of those tests for statistical significance, you have to specify the level of confidence that you want as well (95%, 97% and 99% are the most common ones). Those give you the probability that the results are not due to chance. There is always some chance that nothing fundamental is different between samples and you just ended up with a fluke.

It’s been 40 years since I’ve had a statistics class, so…

Well, that’s the way it’s typically done but you don’t need to do that in advance. IIRC some of these formulas will crank out the level of confidence as the final result. But if you like, let’s use a 95% confidence level for significance.

As OldGuy said, you have to decide what assumption you’re going to make about the mean.

Also, is it really correct to say that the sample size is 4.5 million? If what you’re interested in is the rate per year, then it seems to me your sample size is 1 for each treatment, because you’re comparing 1 control year to 1 experimental year.

The point is that if something is signficant at alpha 0.05 but not 0.01, then it would be very bad science to then choose the one that supports your hypothesis. Otherwise you are correct that there is nothing practical stopping you from looking at various levels of confidence.

We see this in sensational newspaper reports. “epidemic of knife and gun crime”, “spiralling out of control”, creating a level of youth disorder which adds up to “a national crisis”.

This flies in the face of the actual statistics which show crime rates peaked in 1995, then fell by 42% over the subsequent 10 years.

This kind of reporting can be self-fulfilling as young people who are not sophisticated enough to see through it, will arm themselves with knives because of the perceived threat.

I now feel that I must arm myself for protection from these unsophisticated panicky knife-wielding young people that, I assume, you must have shown to have increased in number by a statistically signifiant amount? :slight_smile:

Related to sensational headlines, here’s a closer blog look at a recent (12/21/2015) one:
Skepticism About Study Linking Antidepressants And Autism

It’s worth remembering that the whole point is that you’re trying to answer the question “If there was no cause-and-effect, just random up and down fluctuations, how likely would it be that random fluctuations alone would produce this result we just observed?” and then hope that your answer comes out really really unlikely, like less than 1% chance.

For example, it’s very easy to believe that 7 people might be struck by lightning in a given year, and 19 people struck by lightning in the following year, and have that difference be entirely due to random chance, with no cause-and-effect at all. To really be convinced that there was something new, something different, causing more people to get struck by lightning, you’d need to make observations over a much larger time scale.

Large universes will almost always show noticeably large numbers of events of almost anything you can think of. (Yes, 19 homicides might be noticeably large.) Within those events, there will be clumping just by chance in addition to any possible clumping because of a real effect.

The real world example that gets the most press is cancer clusters. In a population of 320 million - the whole U.S. - there will be millions of cancer cases and hundreds or thousands of even rare cancers, sometimes in close proximity. So is something causing those cancers or it is just random statistical clumping?

The only thing you can be 100% certain of is that the people involved in those clumps won’t believe - sometimes rightly, sometimes wrongly - that the clumps are merely statistical, even if they are significant.

I’d make statistics, along with elementary economics, mandatory high school courses.

Statistics is a difficult subject, even at the most basic level. Plenty of people can’t even grasp the concept that large groups are easier to predict than small groups.