I really hope there is someone who knows their stuff enough to answer this question as this has been seriously bugging me!:mad:
There is a set of data that I have to analyse at work.
Typical results for the weighted mean and weighted standard deviation are 30 and 150 respectively.
The weights for individual items can vary by 2 or 3 orders of magnitude.
Sometimes the mean is low enough (and the standard deviaiton high enough) for the mean to seem like it would be statistically indistiguishable form 0, for any reasonable confidence level.
I would like to be able to determine this precisely using the standard error of the mean, but I don’t want to use a simple (evenly weighted) mean, std dev and standard error.
I have come up with the following solution and I want to know what the experts think:
The “usual formula” for standard error is: SE = (STDEV)/n^0.5
If I square both sides and expand STDEV then:
SE^2 = (sum(xi-X)^2)/n^2, where X is the mean and xi is the ith observation
simplifying leaves me with:
SE = ((sum(xi-X)^2)^0.5)/n
I then modify this to include weights
SEw = ((sum(wi(xi-X))^2)^0.5)/sum(wi^0.5), where wi is the weight of the ith observation and SEw is the weighted standard error.
For evenly weighted data sets the above formula gives exactly the same answer as the “usual formula”.
For unevenly weighted data sets, higher weighted observations have a greater affect on the standard error.
As I was wrinting this I thought of another formula that would give the same results as the “usual formula”. If I were to rebase the weightings such that their sum equals the number of observations and just used wSTDEV/sum(rwi) where wSTDEV is the weighted standard deviation and rw is the rebased weight of the ith observation.
So both of these methods seems logical to me and both produce reasonble results (for the data sets I have checked.) I am a little surprised by the lack of decent search results on the topic though.
Can anyone add any insight to this? Does anyone know how to compute confidence intervals from either of these formmulae?
I’m not sure I’m reading what you’ve written correctly, but are you comparing the observed values to the mean or to the estimate?
Because it looks like you’re summing the [square of the deviation (rather than the error) divided by the number of observations] and the raising the whole thing to the 1/2 power.
For a standard error, you would sum the squared error terms (predicated minus observed) divide by the number of observations minus the number of parameters, and then take the whole thing to the 1/2 power.
Using n-2 (or whatever amount of parameters) for the degress of freedom helps ensure an unbiased standard error of estimates. I’m not sure if some of the differences are due to field, I come from a finance background, fwiw.
Thanks for the repsonse Darth Panda
It looks like you are talking about about standard error of estimate in a regression model.
I am talking about the standard error of the mean.
From wikipedia:
Quote Darth Panda:
This is the formula for standard deviation.
To get to standard error of the mean you them divide by the square root of the number of observations.
My question is how to do this for weighted observations.
First estimate the population variance sigma[sup]2[/sup] by the weighted variance sigma[sub]w[/sub][sup]2[/sup] = sum(w[sub]i/sub[sup]2[/sup])/(sum(w[sub]i[/sub]-1)), where x[sub]w[/sub] is the weighted mean. This assumes that all your observations come from a population with variance sigma[sup]2[/sup]. Then the weighted standard error can be estimated by the square root of sigma[sub]w[/sub][sup]2[/sup] times sum(w[sub]i[/sub][sup]2[/sup])/(sum(w[sub]i[/sub]))[sup]2[/sup]. In other words, the weighted variance times the sum of squares of the weights divided by the square of the sum of weights. This estimate basically assumes that the weights represent counts of repeated observations from the same population. If this is not your assumption, then you have to try something else.
May I ask why you subtract one from every w[sub]i[/sub]?
That assumption does not work for me.
To give a bit of background on this, the data is many thousands of tranactions that are compared to a benchmark and a performance figure measured in basis points is given to each.
Obviously the average performance of any set of trades is interesting so this is a weighted mean (weighted by the value of the transaction). To keep things consistant I wanted to use a “weighted everything else” (weighted standard deviation, weighted standard error of the mean etc). I am starting to think that while weighting is neccessary for reporting the average performance, when computing confidence intervals etc it is probably better to weight each event (trade) evenly. In other words, giving extra weight to larger trades implies that they occur more often (for calculating things like variance etc) which they do not.
So I guess I have answered my own question, unless anyone has any further advice.
There are a bunch of different ways to calculate weighted standard errors, and two simple ones are commonly used. One is biased and the other is unbiased.
In your “rebase” discussion, you’ve derived the two ways to specify weights. They are either stated in terms of “absolute” or “raw” weights, or “relative” or “proportional” weights.