My statistics education is old and faintly remembered; some kind soul please help me work this out.
Let us say we have population #1, which is normally distributed with mean X and standard deviation a. If we randomly select two individuals from this population, what is the probability that they will differ by at least 5%?
Let us also have population #2, which is normally distributed with mean Y and standard deviation b. If we randomly select one individual from population #1 and one from population #2, what is the probability that they will differ by at least 5%?
If the latter is difficult to calculate, how about for the case in which the means differ but the standard deviations are equal (i.e. a=b).
Saying that two individuals differ by 5% is slightly ambiguous because 5% of the smaller value is not the same as 5% of the larger value. What exactly do you mean by that?
The difference between two individuals from population #1 has mean 0 and standard deviation a * sqrt(2), and the difference between an individual from population #1 and one from population #2 has mean X - Y and standard deviation sqrt(a^2 + b^2). For specific values of a and b, you can compute the probabilities that these differences will be larger than any given value, but there’s no closed form symbolic expression.
Eh, I just picked 5% out of a hat for representative purposes, so as not to throw in another variable.
I’m not looking for the average difference, I’m looking for the probability of randomly picking two individuals which are more than <blah units> apart. Is there any way to calculate this?
Indeed there is, and in order to calculate it you need to compute the average difference
If two individuals are more than <blah>, say z units for convenience, then the difference between them is more than z.
As ultrafilter said, the difference between the two individuals follows a Gaussian distribution (call it capital Z),
- whose average is EZ = X-Y
- whose standard distribution is sZ = sqrt(a^2 + b^2).
(I call them EZ and sZ for convenience)
Now the question is: what the probability than Z>z (or Z<-z) ?
At this point you can just use an online calculator such at this one. Compute EZ, sZ and enter them in the boxes ‘Mean’ and ‘Sd’, then select ‘Ouside’ and enter z in the two boxes.
Just to get a little deeper in it, the usual way is to ‘normalize Z’, i.e. to work on the distribution U = (Z-EZ)/sZ, which has a mean of 0 and a standard deviation of 1. (the Gaussian distribution with mean of 0 and std. dev. of 1 is generally written U)
The idea is that Z>z (or Z<-z) is equivalent to U>(z-EZ)/sZ (or U<-(z-EZ)/sZ ).
So, your procedure would be:
- You know X,Y,a,b as well as the minimal difference z
- You compute EZ, sZ, and then (z-EZ)/sZ
- Then you use Gaussian distribution table which has been computed for U and in which you can read these probabilities directly.