A quick stats computation

Is a 52-59% difference significant between 2 groups of 600 people each? I forget how to figure that out, exactly. A detailed answer would be appreciated.

Again, 52% of group one (N=600) and 59% of group two (N=600).

Thanks

Seriously? NO ONE?!

For a binomial distribution, the standard deviation (sigma) is sqrt (p * q * n), where p is probability, q is 1 - probability, and n is the number of trials.

sqrt (.52 * (1-.52) * 600) = 12.23
sqrt (.59 * (1-.59) * 600) = 12.047

This means that your answer within one standard deviation will probably vary by about 12 for each case (out of 600). Three standard deviations are off by 36, and is 95% confident. (59-52)/100 * 600 = 42.

Since 42 is bigger than 36, we can say these results are statistically significant.

I think.

According to my Practical Data Analysis workbook notes, in order to determine if the % population fraction from two samples is the same, you’d need a sample size of:

n1 = n2 = 16 * (%) * (1 - %)/(delta%)^2

Assuming your expected % is around 55.5%, then plugging these numbers into the above formula I get:

n1 = n2 = 16 * (0.555) * (1 - 0.555)/(0.07)^2 = 806.449

In other words, you’d need a sample size of 806.449 in order for these to be statistically the same. Since you have fewer than that, the error in your % will be greater and therefore they are likely the same.

But now I see Santo Rugger posted just the opposite. I’ll check around a little more.

I’m trying to work out the probability density function, but the numbers are way to huge for my calculator or MatLab to handle (600!).

I don’t trust the reasoning behind either of the answers presented so far. There’s a calculator here that will tell you whether there’s a statistically significant difference at a specified confidence level, if that’s all you need. If you need to be able to perform the test yourself, I believe you want to look at Welch’s t-test.

For the record, the calculator above specifies that your proportions are different at 95% confidence.

How is the t-test different than what I did? IIRC, the t test shows the probability of the answer being in the “tail” of the distribution curve. Three sigmas is 95%, or a 2.5% chance of being in the “tail” of the curve. Four sigmas is 99%, so using the method I used, 12*4 > 46, hence the results not being statistically significant at 99%.

I’m horrible at statistics, and welcome the flaw in my logic being pointed out.

I follow Santo Rugger’s and Ultrafilters reasoning, this one makes no sense to me.