Why half-life and 1 standard devation = 68%?

I know what both a half-life and a standard deviation are, but I’m curious about the “why” underlying them:

  • why is the half-life considered the key measure as opposed to, say, the 99.9th-life (the point at which 99.9% of something has decayed so you could say it’s more or less gone)?
  • why did mathematicians decide that the first standard deviation would account for 68% of the population (logical alternatives I can think of are half or 66.6%)? My guess is that 68% is an artifact of the math that allows the second deviation to be 95% and the third 99%. But that’s just a guess.

For the first bit: you’re right that ‘half-life’ is a bit arbitrary. Whenever a quantity decays exponentially, you could talk about the ‘half-life’, ‘third-life’, ‘hundredth-life’ or ‘pith-life’ and they’re all perfectly well-defined and can be calculated from each other. I guess half-life is just intuitively appealing. In fact boffins rather like to talk about the ‘decay constant’ which is related to the time taken for an exponentially decaying quantity to fall to (1/e)th of its initial value!

For the second bit: the standard deviation of a sample, or a population, or a probability distribution, is defined in a way that’s completely independent of the proportion of the s/p/p.d. covered by the range ‘mean +/- 1 s.d’. It’s got nothing to do with whether that range is 50% or 68% or 95%, in fact depending on what collection of data you are looking at, it could be any percentage at all.

What the 68%/95%/99% stuff refers to, is a particular, ubiquitous distribution called the ‘Normal’ a.k.a ‘Gaussian’ a.k.a ‘Bell curve’. For that distribution, the +/- 1 s.d. range happens to cover about 68% of the area, and the +/- 2 s.d. range covers about 95% of the area. Wiki ‘Standard deviation’ to see how it’s actually calculated.

Not everything is distributed in a Gaussian fashion, but lots of common things are (eg people’s heights), and there’s a groovy thing called the Central Limit Theorem which basically states that the more you mash things up, the more any data you care to collect will look Gaussian.

I don’t know what the origins of using half-life are, and don’t know enough for an educated guess. I suppose it allows for quick in-the-head calculations vs. some other arbitrary percentage.

For standard deviation, they didn’t start with some percentage and declare that as a standard deviation, it’s the other way around. Mathematicians didn’t *decide *that 68.27% of a population is within ± 1 standard deviation. It’s just the way it is, for a specific type of distribution called the normal distribution. That percentage could be different for other distributions, samples, populations, etc. It’s very clear if you study the underlying theory of where this comes from and how standard deviation is actually defined. The Wikipedia article is a start, although you usually spend a couple of class periods on this in a college stat course.

Consider the simple distribution with two equal-sized modes, e.g. ten people of whom five are exactly 5-feet tall, and five are exactly 6-feet tall. The deviation from the mean (5-feet 6 inches) is always exactly 6 inches in this example, and that happens to be the standard deviation here. In that sense the s.d. is not defined arbitrarily.

I’d write “the more you add things together” rather than “the more you mash things up.” Assuming that arbitrary “mashing” leads to Gaussian distribution is a fallacy that played a role in recent financial crises.

For useful application of radioisotopes, they will be useless long before they reach they reach their 99.9th-life. For a Radioisotope Thermal Generator (RTG) or a radiation therapy machine you need to know to know long before it needs to be replaced.

If you need to know the 99.9th-life, it can be easily approximated as 10 half-lifes.

This is an important point. Consider, for example, what happens if the effects are multiplicative, rather than additive. Then you are studying a random variable that is the product of a large number of independent random variables. The logarithm of this is then the sum of the logarithms of a large number of random variables and, by the Central Limit Theorem, follows the normal (gaussian) distribution. The conclusion is that our “mashed-up” variable follows a so-called log-normal distribution. The additive case is more common than the multiplicative one, but log-normal distributions are still ubiquitous.

To show what’s going on, let’s create an example of a population:

1, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 12, 14

You can calcultate that the mean of these numbers as follows:

1 + 2 + 3 + 3 + 3 + 4 + 4 + 4 + 4 + 5 + 5 + 5 + 6 + 12 + 14 = 75

75 / 15 = 5

Note that the mean is not the median. The median is the number in the middle if you arrange them in ascending order. You can see that the number in the middle is 4, so that is the median. The mean is defined so that the total of the differences of the population from the mean is 0. In this case, that means the following is true:

(5 - 1) + (5 - 2) + (5 - 3) +(5 - 3) + (5 - 3) + (5 - 4) + (5 - 4) + (5 - 4) + (5 - 4)

  • (5 - 5) + (5 - 5) + (5 - 5) + (5 - 6) + (5 - 12) + (5 - 14) = 0

O.K., then, what if instead of taking the sum of the differences from the mean, you took the sum of the squares of the differences from the mean, like this?:

((5 - 1) **2) + ((5 - 2)**2) + ((5 - 3)**2) +((5 - 3)**2) + ((5 - 3)**2) + ((5 - 4)**2)

  • ((5 - 4)**2) + ((5 - 4)**2) + ((5 - 4)**2) + ((5 - 5)**2) + ((5 - 5)**2)
  • ((5 - 5)**2) + ((5 - 6)**2) + ((5 - 12)**2) + ((5 - 14)**2) = 184

The mean of this is 184 / 15 = 12.267 (approximately). Since this is the sum of squares, let’s take the square root of it. It turns out to equal 3.502 (approximately). Call 3.502 the standard deviation. How many of the numbers are within 3.502 of the mean 5? All of them except 1, 12, and 14. How many of the numbers are within (2 * 3.502) = 7.004 of the mean 5? All of them except 14.

You can show that for a normally distributed population, 68.27% of the population lie within one standard deviation of the mean, 95.45% lie within two standard deviations of the mean, 99.73% lie within three standard deviations of the mean, and so on for various values that we can easily calculate. (All of those percentages are approximations to the nearest .01%.) So there’s nothing arbitrary about 68.27%. It just falls out of the mathematics.

Exponential decay and normal distribution are represented by very specific shapes graphically. Half-life and standard deviation are used as a shorthand to communicate the scale of each. There is no why, they are descriptive, not prescriptive. I suppose scientists could have used letters A-Z to describe the slope of the line for exponential decay instead, or fast, medium, slow depending on how many different descriptors were useful.

In nuclear physics, decay lifetimes are often given as the “e-folding time” or “scaling time”, which is the time it takes to decay down to 1/e (about 36%) of the original value. This is useful since its basically the same value as the decay constant and is a pretty natural constant to use given the usual way of writing the decay equation.

But I imagine half-life exists because its easier to calculate successive powers of 2 then 1/e in your head (though you can estimate 1/e as 1/3, so its not that much harder).

I wrote some software that takes a number of samples from a voltmeter and calculates basic statistics on the results. I allow the user to discard samples outside x standard deviations to come up with a mean without outliers. I always thought it was neat that when discarding readings outside 1 standard deviation, you were always left with about 68% samples to compute a new mean. 2 standard deviations left you with about 95% of the samples and 3 left you with around 99%. This really drove home the concept of a Gaussian distribution.

The first sentence of my second to last paragraph should say:

> The mean of these squares is thus 184 / 15 = 12.267 (approximately).

For another way of thinking about the standard deviation, take that bell curve. In the middle (the hump of the bell), the curve is concave downwards. Out on the ends (the tails), it’s concave upwards. That means that there must be some point partway out where the concavity changes from down to up. If the bell curve is a standard Gaussian curve, then that point will be at exactly 1 standard deviation.

On the bit about 68% of the population being within one standard deviation, that’s always true of a Gaussian, but there’s also a related result for an arbitrary distribution (at least, for one with a standard deviation-- More on that in a moment). Chebyshev’s Theorem states that no more than 1/k[sup]2[/sup] of the population can be found at more than k standard deviations out. So, for instance, you’re always guaranteed that at least 75% of the values will be within two standard deviations. Note that this is only a bound: You could have a distribution that has all of its values (or at least, an arbitrarily large portion of them) within one standard deviation.

On the Central Limit Theorem (the one that causes Gaussians to show up all over the place), there are some conditions on it that are often overlooked. For one, the distributions of the things you’re adding have to all have more or less the same width. If I take 3 or 4 6-sided dice and add them all up, I’ll get something that’s a decent approximation to a Gaussian. If I then take those 4 6-sided dice and add a single hundred-sided die, now the results won’t really be anything close to a Gaussian.

Another restriction on the Central Limit Theorem that’s more subtle is that all of the distributions you’re using have to all have a standard deviation-- That’s not always the case. For instance, the Lorentzian distribution does not have a standard deviation (or, if you prefer, its standard deviation is infinite), and it’s debatable whether it even really has a well-defined mean. And if you additively combine a bunch of Lorentzians, no matter how many of them you use, no matter what their widths are, the result will always be another Lorentzian, never a Gaussian.

Return distributions in financial markets have long been observed to have longer tails than the Gaussian distribution would predict. Assuming a Gauss approximation could conceivably be sensible under certain circumstances. But assuming a Gauss then claiming that you’ve had 4 once-in-every-ten-thousand-year disasters over the past 20 years is not.

In a more general context, the late and great econometrician Peter Kennedy said:

From A Guide to Econometrics 6E, p. 59.
Incidentally, the measure of both the “Pointiness” of the distribution and the length of the tails is called kurtosis.

It may be worth mentioning that you could add a fudge factor to the equation for standard deviation so that you get the “deviation” within which exactly one-half the population of a normal distribution exists, and you could rewrite the equation for the normal distribution in terms of that “half-population deviation” too - but it would complicate both equations quite a bit. So in that sense it’s “arbitrary” that we don’t complicate the probability distribution equations for convenience, while we do talk about half-life instead of scaling time (which falls out of the exponential decay equation very naturally) because talking about 1/2 is more convenient than talking about decline to 37 or so %.

The “half-population deviation” is actually a better measure of width, in some ways, than the standard deviation, in that it’s well-defined for all general distributions. So far as I know, it (and others along the same lines) is the only standard measure of width which is universally applicable like this.

The reason we prefer the standard deviation to a measure like this is that it contains information about the entire distribution. I can create multiple distributions that have the same interquartile range but very different standard deviations; on the other hand, if two distributions have the same standard deviation, there’s only so far apart that their interquartile ranges can be.

Edit: Assuming, of course, that there’s some sense in which the two distributions should be comparable.

Only a year or so after I started at my job, I noticed that some of the distributions we were interested in were closer to a Laplacian distribution than a Gaussian; since a Laplacian distribution is very heavy-tailed, 3-, 4- and 5-standard deviation events occurred much more frequently than expected; I convinced co-workers to use a 99% percentile metric instead, which helped us avoid underestimating the likelihood of large results.

Nice webpage on measures of dispersion: http://iridl.ldeo.columbia.edu/dochelp/StatTutorial/Dispersion/

Range - takes min and max values.
Standard Deviation
Root Mean Square (divides by N rather than N-1)
Interquartile Range (IQR)

Median Absolute Deviation (MAD)

There is also the Mean Absolute Deviation alias Average Absolute Deviation, not listed on the page. Both use all the data in the sample.

Trimmed Variance

Good stuff. You can also generalize measures of dispersion as L1, L2, Linf, etc. Norm (mathematics) - Wikipedia.

In the same way that L2 is the natural measure for dispersion of a gaussian distribution, L1 is the natural measure for a Laplacian distribution, and Linf is for the uniform distribution.

Here’s something I’ve always wanted to know. Why would anyone do this? What is the rationale for squaring everything like this? What happens when you do this? Color me clueless here.