Let’s say we’re talking about comparing average test scores between boys (group A) and girls (group B), and the overall group (Group C, where Group C = Group A + Group B), and we’re looking over the period of two tests.

Without knowing how many people are in each group, is it possible for the average scores of both Group A and Group B to go down since the previous test, while the overall average of Group C goes up at the same time?

No, there is no way for the average of the entire group both to rise when the averages of both sub groups declines. One or both of the sub group averages must increase in order for the main group average to increase.

Ask yourself whether the totals could go down for Group A and for Group B (if the numbers in A and B don’t change, the totals must go up and down when the averages do) and yet the total for Group C goes up (but… C = A + B), and you see that this is impossible.

I don’t think Simpson’s paradox covers the question in the OP, where there are 2 distinct groups making up one whole. Wierd things could happen within the data sets due to the paradox, but I can’t think of any way that both can decline while the main set average increases.

Ah, well then it can be thrown off: if the two groups are of widely different average scores, and the group which does worse has much larger representation the second time around.

Let’s start with 3 girls and 5 boys: all the girls have test scores of 90, so the average score for the girls is 90, and all the boys have test scores of 70, so the average for the boys is 80. The average score for boys and girls is 77.5.

For test 2, there are 20 girls, and 2 boys. The both girls get a score of 85 (so the girls’ average score is 85 - less than the previous test result of 90), and the boys each get a 65 - so the average boys’ score goes down as well (from 70 to 65). The group average though goes up from 77.5 to 83.2

Yes is can happen - AndyL gave a very nice example (has a mistake in the first boy’s group average, but that’s obviously a typo - the result is correct). I believe this would fall under Simpson’s paradox.

An average, by itself, tells you nothing. You need to know how variable the data are around the average, and the sample size. As a scientist, I deal with this sort of thing all the time during data analysis.

I think the confusion is due to the fact that the fact that the populations are different sizes is necessary. When I hear “average test scores”, I implicitly think about a class where the boys and girls doing the second test are pretty much the same as the boys and girls doing the first test.

Sure it is. What specifically do you take exception to?

It helps to consider a “test” that girls consistently, deterministically, do better at, and you are certain that girls will get 70%, but boys will get 50% every single time. Then the overall average can be as high as 70% (for an all-girl class) or as low as 50% (for an all-boy class). Increase the proportion of girls, and even if you don’t do anything else, the class average goes up solely due to the change in class composition.

Simpson’s paradox simply happens when a change due to the composition of the class overwhelms a change due to an actual change in the results of the test.