[Statistics] Combine two RMS values with different means

I don’t normally post threads like this but I’m a bit time-limited, I think there must be a simple answer, and I haven’t been able to find it by googling.

I have two data sets, each with a different mean and an RMS error (deviation from the mean). I want to find the RMS error of the entire set. Obviously the combined mean is just the average of the two original means, but I don’t know how find the overall RMS error from the combined mean. I don’t have access to the original data, but the two sets can be assumed to be uncorrelated and contain an equal amount of data.

I probably have no idea what I’m talking about, but I recently solved an electronics question in which I learned that the RMS of something with two components is the square root of the sum of the squares of the individual RMS values.

Signal 1: RMS =5
Signal 2: RMS = 12

RMS combined = sqrt(25+144)

Does that help?

I might also be completely misremembering.

I think that’s correct if they have the same mean, but it doesn’t help for this problem. Here is an extreme example that should illustrate the problem:

Data set 1 and Data set 2 both have an RMS error of zero. The mean of the first set is -1 and the mean of the second set is 1.
Solution: The mean of the combined set is 0 and the RMS value is 1.

This article Standard deviation - Wikipedia on Standard deviation shows the formula for calculating the combined Standard deviation of two sets - it’s a function of the number of elements in each set, the means of each set and the standard deviations of each set. Since you can calculate standard deviation from rms and the mean (and vice versa), I think this gives you the information you need

The combined mean is not the average of the means, but weighting each mean by the number of data points used to calculating that mean, and dividing the sum of the weighted means by the sum of the weights.

For RMS error, you mean standard deviation?

In this specific case the combined mean is the average of the two means, since each set contains the same amount of data as I mentioned in the OP.

By RMS error I do mean standard deviation but I don’t want to imply that the data follows a Gaussian distribution. The individual data sets are probably reasonably Gaussian but the combined set will not be (the histogram will contain two peaks, one at the mean of each original set).

Thank you, this is what I needed. I thought it would be a simple formula like this.

Glad to help.

Let me guess: The databases are human heights, one database for men, and one for women, right?

Not even close. Measurements of signal jitter in an electrical circuit :slight_smile:

In order to get the true answer you would have to go back to the original data, but if your samples are large enough so that the means and RMS’s are very well measured than you can use the following formula.
for n1 and n2 being the numbers of samples in the two groups,
m1 and m2 being the means for the 2 groups
and r1 and r2 being the RMS errors for the two groups,

total mean = mt = (n1m1+n2m2)/(n1+n2)

total RMS= sqrt((n1*((m1-mt)^2+r1^2)+n2*((m2-mt)^2+r2^2)))/(n1+n2))

if the sample sizes are the same so that n1=n2, then this reduces to

total mean= (m1+m2)/2
total RMS= sqrt((r1^2+r2^2+1/2*(m1-m2)^2)/2)