Help Me Understand This Math Problem

Hallo.

Long time, first time, and all that.

I have a math problem that is driving me batty. I strongly suspect I am misinterpreting the wording, and thus getting screwed up. I’m wondering how others interpret it? If you read it differently than I am, please tell me how you see it! I am not looking for the numerical answers, this is not a request for others to do my homework. I can mess that up all on my own, thanks. :smiley:

The problem, as it is written:
Illustrate Simpson’s paradox. (Don’t worry of you don’t know what that is. Just ignore it.) Make up tables of overweight (yes/no) by early death (yes/no) by smoker (yes/no) such that:

  • Overweight smokers and overweight nonsmokers both tend to die earlier than those not overweight.
  • But when smokers and nonsmokers are combined in a two-way table of overweight by death, persons not overweight die earlier.

How I am reading it:

Condition 1: Fatns > Slimns, Fats > Slimns

Condition 2: Fatsns + Fats < Slimns+ Slims

THIS IS UNPOSSSIBLE! Bigger + bigger IS NOT less than Smaller + Smaller!
Or, another way, if rock beats scissors twice, two bloody scissors is not gonna beat two bloody rocks in a double duel!

It’s driving me crazy. Help?

Surely it is just meant to illustrate that most smokers aren’t overweight but still die early
.

You’re not factoring in the fact that there are different numbers of people in each category, AND ALSO different death ages. Also, not that it matters, but I think your <'s and >'s are the wrong way round. So your equations should actually be:

Condition 1:Fatns < Slimns, Fats < Slimns

Condition 2: (Fatns * FatnsNumberInCohort + Fats * FatsNumberInCohort)/(FatnsNumberInCohort+FatsNumberInCohort)>(Slimns * SlimnsNumberInCohort + Slims * SlimsNumberInCohort)/(SlimnsNumberInCohort+SlimsNumberInCohort).

And yes, it is possible to find an example - but I’m not posting it :wink:

There’s your problem right there. You do have to worry if you don’t know what that is. The whole point of your question is that you don’t understand Simpson’s paradox.

Check out the Wikipedia article. In particular, you might want to look at the “kidney stone treatment” example, and mentally substitute “Overweight” and “Not Overweight” for “Treatment A” and “Treatment B,” and “smokers” and “nonsmokers” for “large stones” and “small stones.”

A common situation where something like this can happen is in batting averages. It’s a question of weighted averages. The sum total of my knowledge of baseball boils down to: The season is split in half by the All Star Break. This is probably not even true, but it doesn’t matter. :smiley:

So we have two batters. In the first half of the season, batter A has a better average than batter B. In the second half of the season batter A has a better average than batter B. Over the season as a whole, batter B has a better average than batter A. How can this happen? Here’s an extreme case:

1st half: Batter A gets up to bat, hits a home run. This is the first game of the season. At his next at bat he throws out his back swinging for the fences and doesn’t ever come up to bat until after the break. His average? .500

1st half: Batter B has a great start to the season, he comes to bat 100 times and hits safely 34 of them. His average is .340

2nd half: Batter A is off the disabled list and comes to bat 100 times. He hits safely 25 of those times and so has an average of .250. This is still pretty good.

2nd half: Batter B is tired from going so gangbusters in the first half and only bats 10 times. He only gets 1 hit and retires in disgrace. Effigies are burned and his children never show their faces in public again, so ashamed are they of his pathetic .100 batting average.

Over the course of the whole season? Batter A has 102 at bats and 26 hits for an average of .254 which is a respectable number. Batter B has 110 at bats and 35 hits. His average is .318. Oops.

As I understand it, this type of thing happens fairly often in baseball. Some players play more at the beginning of the season and then get hurt or whatnot and so play less. Other players really step it up at the end of the season when playoff time starts getting close. Because they play different numbers of games in the different seasons, the averages do counterintuitive things.

Another example is alcoholic content in drinks. Bacardi is 40% alcohol while Malibu is only 20%. Everclear is 95% alcohol while 151 is 75%. If you take a bottle of Bacardi and put in a splash of everclear you get something with less punch than a bottle of 151 with a splash of Malibu, even though Bacardi is stronger than Malibu and Everclear is stronger than 151.

By the way, you shouldn’t make any of these conconctions because they’d be gross. And you’ll probably go blind.

The first example I saw of Simpson’s paradox was in looking at gender-specific admissions statistics for two colleges. At both colleges, girls who applied were more likely to be admitted than boys, but when you looked at the combined admissions, boys were more likely to be accepted than girls. I wish I could remember the rest of the details.

This sounds like the “Berkeley sex bias case” described in the Wikipedia article I linked earlier, or something quite similar. John Allan Paulos used this example or one like it in his book Innumeracy, which may well have been the first example I ever saw of Simpson’s paradox, though I don’t recall whether Paulos identified “Simpson’s Paradox” by that name.

Right. Simpson’s Paradox is about a pooled pattern conflicting with stratified patterns.

Thanks for all the help. The issue, however, was not misunderstanding Simpson’s paradox- that I understood (I have like 5 other problems about it, which weren’t very hard). The problem was wording of the problem itself.

The first part of the problem was to make a table showing:
“Overweight smokers and overweight nonsmokers both tend to die earlier than those not overweight.”

The problem for me was the word AND.

I read AND as “additive”- making the sentence “Overweight smokers PLUS overweight nonsmokers both tend to die earlier than those not overweight.”

Where I should have read it as “in addition to”- making it: “Overweight smokers **AS WELL AS/B] overweight nonsmokers both tend to die earlier than those not overweight.”

My solution is thusly:
Condition 1:
Early Death Late Death Totals Percent Die Early
Overweight Smoker 654 566 1220 53.61%
Slim Death Smoker 1076 1050 2126 50.61%
Overweight Nonsmoker 606 688 1294 46.83%
Slim Nonsmoker 76 88 164 46.34%

Condition 2:
Overweight Slim
Early Death 1260 1152
Late Death 1254 1138
Total 2514 2290
Percents 0.501193317 0.503056769

If all that is badly formatted once I hit post, I apologize.

Thanks again for the help.