To show what’s going on, let’s create an example of a population:
1, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 12, 14
You can calcultate that the mean of these numbers as follows:
1 + 2 + 3 + 3 + 3 + 4 + 4 + 4 + 4 + 5 + 5 + 5 + 6 + 12 + 14 = 75
75 / 15 = 5
Note that the mean is not the median. The median is the number in the middle if you arrange them in ascending order. You can see that the number in the middle is 4, so that is the median. The mean is defined so that the total of the differences of the population from the mean is 0. In this case, that means the following is true:
(5 - 1) + (5 - 2) + (5 - 3) +(5 - 3) + (5 - 3) + (5 - 4) + (5 - 4) + (5 - 4) + (5 - 4)
- (5 - 5) + (5 - 5) + (5 - 5) + (5 - 6) + (5 - 12) + (5 - 14) = 0
O.K., then, what if instead of taking the sum of the differences from the mean, you took the sum of the squares of the differences from the mean, like this?:
((5 - 1) **2) + ((5 - 2)**2) + ((5 - 3)**2) +((5 - 3)**2) + ((5 - 3)**2) + ((5 - 4)**2)
- ((5 - 4)**2) + ((5 - 4)**2) + ((5 - 4)**2) + ((5 - 5)**2) + ((5 - 5)**2)
- ((5 - 5)**2) + ((5 - 6)**2) + ((5 - 12)**2) + ((5 - 14)**2) = 184
The mean of this is 184 / 15 = 12.267 (approximately). Since this is the sum of squares, let’s take the square root of it. It turns out to equal 3.502 (approximately). Call 3.502 the standard deviation. How many of the numbers are within 3.502 of the mean 5? All of them except 1, 12, and 14. How many of the numbers are within (2 * 3.502) = 7.004 of the mean 5? All of them except 14.
You can show that for a normally distributed population, 68.27% of the population lie within one standard deviation of the mean, 95.45% lie within two standard deviations of the mean, 99.73% lie within three standard deviations of the mean, and so on for various values that we can easily calculate. (All of those percentages are approximations to the nearest .01%.) So there’s nothing arbitrary about 68.27%. It just falls out of the mathematics.