statistical testing

I’m trying to figure out how bullying affects graduation rates.

I have a group of students, I divide them into Group A which reports they have been bullied and group B which hasn’t been. Group A graduates high school at rate Y and group B graduates at rate Z. What test would I perform in order to decide whether bullying has a statistically significant effect?

My last stats class was like 7 years ago, so the less technical the better.

Do you have any reason to believe that bullying is the only difference between the two groups? If not, you need to find out what the other differences are and incorporate them into your dataset to make any reasonable inference.

Post the dataset (anonymized) and I will analyze it for you. But in a nutshell, you want to compare the following augmented (A) and compact (C) models:

A: graduation_rate = b[sub]0[/sub] + b[sub]1[/sub]bullied + error
C: graduation_rate = mean(graduation_rate) + error

Lets assume that it’s the only difference. You can’t have a completely homogeneous group except for this one things, but lets assume so.

Is this the same as having two die, and trying to figure out if one is weighted?

Sounds like you have a two-by-two contingency table, and could use Fisher’s exact test or the appropriate chi-square test.

A simple t-test (or, if your samples are large enough, z-test - the computation of the test statistic is very similar, but you measure it against a standard normal distribution rather than a t distribution) should do the job. It’s based on the assumption of equal variance, but to my knowledge this is usually a rather safe assumption to make. The next item on the Wiki page I linked to is concerned with cases where the equal variance asusmption does not hold.

I would agree with t-test. It’s not a contingency table as “bullying” is the independent variable, and the dependent variable is presumably a continuous variable.

You could also of course do a one-way between-subjects ANOVA, which gives F, which = t[sup]2[/sup] when there are two groups.

I would probably agree with this.

It sounded to me like the data consisted of counts of the numbers of people in four categories: bullied and graduated, bullied and didn’t graduate, not bullied and graduated, not bullied and didn’t graduate. This would be a 2x2 contingency table. But perhaps I’ve misunderstood.

Reading the OP again, now I’m not sure what was meant. I didn’t interpret it that way initially, but it makes sense both ways now.

This one.

So, you want a chi-square test such as Pearson’s chi-square, or Fisher’s exact test. The latter gives an exact p-value even if the sample size is small. As pointed out by another poster, unfortunately you can’t draw any conclusions about whether bullying actually affects graduation rates, because there was no experimental control – there may be some other variable that both causes people to be bullied and causes them to have a low graduation rate.

Here is a web site that can calculate the Fisher’s and Chi-Squared tests for you. As stated above, the Fisher’s test is probably better, particularly if your sample size is small.