I’m currently looking into some statistics that have a bearing to a discussion I’m having with a friend.
The data is being used to compare a large group (Group A=2500) of people against a small group (Group B=59). I have argued that this data cannot be be used to compare the groups as Group B is too small. My friend has consulted his wife who is a statistics expert. She says that the sample size is not as
statistically important as whether the data was “weighted”. I kind of think I know what this means but could someone explain it more clearly?
Hope this makes sense as for various reasons I don’t want to go into more detail.
A weight is a value associated with a unit, which represents the relative importance of that unit (when analysed). A weight might be allocated according to some other value of that unit e.g. a unit (x) has values y and z. Value y may be weighted dependent on the value of z.
In terms of sample size, there will be fewer degrees of freedom in the smaller sample. However, I don’t automatically see a problem in using samples of differing sizes, (assuming it’s not a paired test) - it all sort of depends on on other stuff.
Proviso - I’m not a statistician, I just have to do stats a bit.
What you’re trying to do here is to construct a test for the hypothesis that the average erect length of white males is significantly different from the average erect length of black males. This is a pretty standard type of problem, and the formulae would be included in your average undergraduate stats textbook.
Let population 1 be the white males, and population 2 the black males. m[sub]i[/sub] denotes the sample mean (average) of population i; similarly, s[sub]i[/sub] is the sample standard deviation and n[sub]i[/sub] is the sample size. We have m[sub]1[/sub] = 6.2, m[sub]2[/sub] = 6.3, n[sub]1[/sub] = 2500, and n[sub]2[/sub] = 59.
The test statistic is m[sub]1[/sub] - m[sub]2[/sub] = -.1, and the endpoints of the confidence interval are given by -1 + 1.96*sqrt(s[sub]1[/sub][sup]2[/sup]/n[sub]1[/sub] + s[sub]2[/sub][sup]2[/sup]/n[sub]2[/sub]). Don’t worry about where the 1.96 comes from–that’s more detail than I want to get into right now.
Unfortunately, without knowing the sample standard deviations, we can’t make any further claims about the data, but I can outline the procedure. Basically, if the confidence interval contains 0, there’s no evidence that there’s a significant difference beween the two groups. Otherwise, there is. Obviously, this is a very simple explanation of the procedure; there’s more that you have to understand to use it.
I’m so glad everyone else is taking lunch right now…
According to this website (work-safe, unlike most of the other Google results…), the standard deviation for an erect penis is 1.2 inches. Since they give no indication that there’s a difference between white and black men, we’ll take s[sub]1[/sub] = s[sub]2[/sub] = 1.2. So the endpoints of the confidence interval are -.41 and .21. Since that contains 0, we don’t have any evidence here that there’s a significant difference.