Four standard deviations corresponds to a p-value of about 1 in 31000, and 6 to 1 in a billion. It’s a nice middle ground between something that could conceivably happen under the null and really strong evidence against the null.
It is a convention widely used in many areas of science. There is no particular reason for choosing that particular number, but it is in more or less the right, sensible range, and you have to choose something. Science depends on such conventions.
Another convention that, I have been told, is used in particle physics (I don’t know about other fields) is the “anti-tennis” rule: if a data point on a graph falls on one of the lines bounding an area of interest, it is is considered to be out of the area (unlike in tennis, where a ball that hits the line is “in”). There is no very good reason why this is better than the using the tennis rule would be - you would get a few more false positives and a few fewer false negatives - but you need to have some rule that is consistent across different experiments and different laboratories or you lose track of what results you can trust, and how much.
John McEnroe would have made a lousy physicist, I guess.
It’s interesting for me to learn that 5 SD’s is a fairly common standard in some fields (physics). In my area, clinical medicine, we usually settle for, or at least hope to see, p < 0.01 (depending on how many comparisons were made, of course). I suppose that this less demanding threshold has roots in practical considerations, i.e. clinical trials are so contaminated with confounding variables that any ‘signal’ in the data tends to be faint, not rising much above the noise.
The joke in particle physics is that “three sigma discoveries are found to be real half the time” which is an exaggeration of a real effect. Because of the difficulty in estimating backgrounds, sampling effects and other similar issues, particle physics gets a lot more “false positives” then one would assume from a purely statistical standpoint.
Hence using five sigma when one might think that four sigma (which staistically is better then one chance in a thousand of being wrong) would be adequate.
Googling about experimental physics and standard deviations, I found this page.
IANAPhysicist, so I’m not sure how accurate that page is, but my basic understanding that I took away from the explanation on that site is that errors in experimental physics can be statistical (effects of random chance) or systematic (as a result of limits/errors in the experiment, measurement, analysis, etc…). 5-sigma is used as a standard because it’s very unlikely that systematic errors would produce that much error, and in the past there are been “particles” that were “discovered” at a 2 or 3 sigma level that turned out to be not really there. Also, in particle collider experiments they often have two different experimental setups testing the same hypothesis - they independently confirm each other as they are unlikely to both have exactly the same errors.
Can anyone more knowledgeable confirm if I got the gist of it right?
Professor Higg found it and claimed personal ownership. Then Higg’s Boson, having a mind and a will of its own, slipped away from him and got lost somewhere in the universe, much like the One Ring of Power sometimes did.
There is only one Higg’s Boson in the universe. (Note that all references to it in the literature are always in the singular.) It zips around the universe, dispensing mass to any other particles it encounters.
Now, these various physicist teams are competing to find it and claim it as their own, knowing full well that whoever finds and possesses it will surely rule the [del]world[/del] universe. What are the chances of it being found in the LHC, when at any point in time it could be anywhere in the universe? But Higg’s One Boson has a will of its own, and seeks to return to its rightful master. Unless it can be cast into the Big Bang whence it was forged, the Universe will surely come to ruin! :eek:
False positives don’t come only from mistakes. There are tons of measurements being made out there, and in the case of general purpose particle physics experiments like one finds at colliders (LEP, Tevatron, SLC, LHC, …), a given search may have tens or hundreds of independent statistical avenues for fluctuations to mimic a discovery of something new. If you are looking for a very specific feature in the data, the statistical analysis is straightforward. If you are looking for anything fishy, then it can be hard to count exactly how many ways something fishy can happen. Even if the measurement and the statistical methods are reported completely factually, the statistical implications can be misinterpreted or, at least, non-intuitive. See the XKCD example.
The 5-sigma number has more or less been determined empirically by the community as a useful benchmark based on the accumulated history of false positives. 3-sigma signals come and go; 4-sigma signals depart once in a while. But these aren’t hard-and-fast cut-offs for believability by any means. The nature of the experimental errors (Are they driven solely by Poisson statistics? Are there any tough-to-estimate systematic uncertainties?) and methodology play a big role in making a skeptical set of people believe that a discovery is really a discovery. Take a look at the superluminal neutrino claim. It was 6 sigma. (6 sigma!) But the fact that 6 is greater than 5 didn’t make people believe it. On the contrary, the experimental setup was frighteningly Rube Goldbergian, and the probability of a time offset mistake was much higher than 2-in-a-billion.
I’m going to dispute this one. I’ve never come across this as a convention of any sort, and it only sort of makes sense as a relevant scenario at all. (Boundaries are, whenever possible, defined independently of the data that will be bounded.)