I have a list of items, each assocatied with a mean and a standard deviation. I want to calculate the average standard deviation. How do I do this? Is it as simple as taking the average of the standard deviations given?
Here are the numbers, if it helps:
Item 1: Mean 115, StDev 282
Item 2: Mean 332, StDev 266
Item 3: Mean 1006, StDev 605
Item 4: Mean 31, StDev 24
Item 5: Mean 449, StDev 556
You can’t do that without some more info. The sample size of each is the most important missing piece but the methodology seems flawed and odd in any case. Is there any reason you want to do that? I basically got bitch slapped once by a statistics guru by trying to average aggregate data and the problem has stuck with me ever since. There are special meta-analysis statistics to do things like this but they fall under a special category and are rather advanced.
The reason I want to do this is because I’m doing some work on inventory management, risk pooling, and delayed differentiation. The idea is that I’m trying to release a product, “item A,” that is supposed to single-handedly replace items 1-5, and I’m trying to figure out how I can see how my overall standard deviations will change from this transition. But it seems from your response that I’m not even thinking about this concept correctly!
As I thought more about what you said, Shagnasty, I started to understand the how my concept really didn’t make much sense. Would the proper way to think about this be to say that the aggregate standard deviation I had from items 1-5 was simply the sum of their standard deviations?
You have N random variables, X1 through XN. For each of these, you have a mean (mu(Xi)), and a standard deviation (stdev(Xi)). You want to find the mean of the standard deviations. A way of thinking about this problem would be that you want the formula of an estimator of the mean of a random variable, whose distribution is the standard deviation distribution.
Assuming that the standard deviations are of normally-distributed random variables, you’d want the mean of this distribution. At least, if my half-remembered statistics are correct.
It has been forever since I took statistics, but to me standard deviations as large as the ones you are getting would just mean that the results are random (not predictable).
I agree. And I don’t think you can combine standard deviations usefully unless you have reason to believe that the instrument error and scatter are the same in all of the experiments. That doesn’t appear to be the case in your data.
It’s a little hard to believe that a mean of 449 and a std dev of 556 represents a Gaussian distribution.
Not necessarily. Those numbers are meaningless if we don’t know what they represent. For instance, I use the standard deviation often for pattern analysis/recognition:
Item 1: Mean 115, StDev 282
This would tell me I can realistically classify input data between -449 and 679 as a positive match. If my test data is uniformly distributed between, say, -10,000,000 and 10,000,000, that’s actually a very narrow filter.
Is there a possibility those numbers are variance and not standard deviation? Because, like you say, those are some wacky numbers. Mean 115, standard deviation 282? It’s been years since I’ve taken stats, too, but if I remember right, that data would basically look like a lot of low numbers, and some very high numbers to push the mean up to 115 and get a standard deviation that high. To give you an idea, the data set: {0, 0, 0, 0, 600} has a mean of 120, standard deviation of 268 (population standard deviation of 240).
multiply each sample average by the number of trials in the sample. Add them together and divide by the total number of trials in all the samples combined. This is the aggregate average. Using the aggregate average and all of the individual readings, measurements, or whatever, compute an overall std dev in the usual way.
All of the samples must be of the same thing, such as mens’ hat sizes in various cities, the samples must be random and the measurements taken in the same way for this to have any meaning.
*Practical Statistics, Russell Langley, Drake Publishers, Inc., New York, NY
I think the other replies are ignoring the supply chain problem you are tackling. The answer to your question is fairly simple. With some assumptions of independence involved, the std dev of demand for the new product will have a std dev that is the square root of the sum of the variances (squares of your std devs) of the five replaced products. Now look at the resulting coefficent of variation of demand for item A. You can easily determine the impact on inventory safety stock.
I asked a similar question a while back, and got a good formula for the total variance (i.e., the variance of all items taken as one group). As has been noted, the meaningfulness of such numbers depends on methods of collection among other things.
Above and beyond all else, remember that, though you can define a standard deviation for most distributions, it’s really only useful for a Gaussian distribution. Now, many distributions one actually encounters can be reasonably approximated by a Gaussian, but not all, and sometimes, that can get you into trouble. Unfortunately, from just the data you gave us, we have no way of telling how Gaussian your distributions actually are.
Come to think of it, another word of warning: Even if these things are all (approximately) Gaussian, the addition in quadrature that nivlac talks about assumes that they’re all independant. But if these distributions are demands for products which are sufficiently similar that they can all be replaced by the same thing, then it’s likely that the demands will be highly correlated. That is to say, when demand for product A is high, it’s likely that demand for products B, C, D, and E will also all be high. In this case, you want to just add the standard deviations together straight.