Wheigted Data in Statistics

I’m currently looking into some statistics that have a bearing to a discussion I’m having with a friend.

The data is being used to compare a large group (Group A=2500) of people against a small group (Group B=59). I have argued that this data cannot be be used to compare the groups as Group B is too small. My friend has consulted his wife who is a statistics expert. She says that the sample size is not as
statistically important as whether the data was “weighted”. I kind of think I know what this means but could someone explain it more clearly?

Hope this makes sense as for various reasons I don’t want to go into more detail.

Thanks people.

A weight is a value associated with a unit, which represents the relative importance of that unit (when analysed). A weight might be allocated according to some other value of that unit e.g. a unit (x) has values y and z. Value y may be weighted dependent on the value of z.

In terms of sample size, there will be fewer degrees of freedom in the smaller sample. However, I don’t automatically see a problem in using samples of differing sizes, (assuming it’s not a paired test) - it all sort of depends on on other stuff.

Proviso - I’m not a statistician, I just have to do stats a bit.

What exactly is being compared here?

Damn, I thought someone would ask.

I think it is probably relevant though. Therefore I can only apologise for this:

Cecil Speaks

PS Could some nice person correct the spelling error in the thread title?

What you’re trying to do here is to construct a test for the hypothesis that the average erect length of white males is significantly different from the average erect length of black males. This is a pretty standard type of problem, and the formulae would be included in your average undergraduate stats textbook.

Let population 1 be the white males, and population 2 the black males. m[sub]i[/sub] denotes the sample mean (average) of population i; similarly, s[sub]i[/sub] is the sample standard deviation and n[sub]i[/sub] is the sample size. We have m[sub]1[/sub] = 6.2, m[sub]2[/sub] = 6.3, n[sub]1[/sub] = 2500, and n[sub]2[/sub] = 59.

The test statistic is m[sub]1[/sub] - m[sub]2[/sub] = -.1, and the endpoints of the confidence interval are given by -1 + 1.96*sqrt(s[sub]1[/sub][sup]2[/sup]/n[sub]1[/sub] + s[sub]2[/sub][sup]2[/sup]/n[sub]2[/sub]). Don’t worry about where the 1.96 comes from–that’s more detail than I want to get into right now.

Unfortunately, without knowing the sample standard deviations, we can’t make any further claims about the data, but I can outline the procedure. Basically, if the confidence interval contains 0, there’s no evidence that there’s a significant difference beween the two groups. Otherwise, there is. Obviously, this is a very simple explanation of the procedure; there’s more that you have to understand to use it.

I’m so glad everyone else is taking lunch right now…

According to this website (work-safe, unlike most of the other Google results…), the standard deviation for an erect penis is 1.2 inches. Since they give no indication that there’s a difference between white and black men, we’ll take s[sub]1[/sub] = s[sub]2[/sub] = 1.2. So the endpoints of the confidence interval are -.41 and .21. Since that contains 0, we don’t have any evidence here that there’s a significant difference.

Good lord, is it?

In rough translation, are you saying that without further information the statistics don’t prove anything one way or the other?

Yep. To compare the means of two groups, you need the sample size and standard deviation for each group.

Ah, didn’t see your 2nd post. That’s great thanks.

And yes, I agree, a difficult subject to google!