Statisticians, Help Me!!!!

jebert · April 16, 2004, 4:18pm

We are having a difficult time understanding when to use two different types of hypothesis tests - Goodness-of-fit and Test for Independence. Both of them calculate a Chi Square statistic and compare it to a condfidence level in order to accept or reject a null hypothesis.

I will use a textbook example which I adapted for this inquiry.

Let’s say we are trying to determine whether exercise relates to illness. We look at 100 people in each of 3 categories:

A - Exercise at an organized fitness center
B - Exercise on their own
C - Do not exercise

We look at illness in a given year.

In case 1 we know only how many people in each group did or did not get sick in the year. Say 12 people got sick in A, 15 in B, and 23 in C. The rest in each group were healthy all year.

In case 2 we know only the number of illnesses in each group during the year (any person could have more than one illness during the year). Say 15 illnesses in A, 18 in B, and 33 in C.

We want to test the null hypothesis:

Ho: Exercise has no effect on illness.

Which test, Goodness of Fit or Independence, is correct to use in each case and why? Would the choice of test in each case be different if we had unequal sample sizes; instead of 100 people in each group we have, say 80 people in A, 95 in B and 120 in C?

Some of us claim that the same test should be used in both cases, and others say one test should be used for case 1 and the other test for case 2. We realize that the final acceptance or rejection of the null hypothesis might well be the same whichever test we choose; we are really interested in the proper test to use and how to make that choice.

Can you give us a clear general rule for deciding when to use each type of test?
(Note: We’re not interested in the details of the calculations, just how to decide the correct test.)

nivlac · April 16, 2004, 4:54pm

Case 1: Use the Chi-Square test on a contingency table with the two classifications (rows and columns) being “Level of exercise” and “Got Sick/Not Sick”. This is actually equivalent to a goodness-of-fit test that all three exercise levels are equally likely to get sick.
Case 2: Don’t see how you can construct a valid contingency table from the given data since you allow multiple illnesses to be counted for the same subject during the year. In a contingency table, every observation has to fall into a unique cell. I would do a goodness-of-fit test here with the assumption that the expected number of illnesses should be the same across exercise levels under the null hypothesis.
In both cases, having unequal sample sizes would not change the choice of test, although in Case 2 you’d have to make some obvious adjustments when calculating the expected values.

Topic		Replies	Views
chi squared vs t test Factual Questions	5	2819	March 20, 2003
Statisical Confusion Factual Questions	4	856	January 15, 2008
Statistics Question: Which statistical test to use? Factual Questions	14	1186	February 9, 2008
Tough Math and Statistics question Factual Questions	6	961	February 24, 2004
Statistical Method Factual Questions	14	973	October 16, 2006

Statisticians, Help Me!!!!

Related topics