Statistics to prove anything

I remember reading quite a few years ago, some news about how some university was accused of sexual discrimination because a higher percentage of male applicants were admitted than female. It was long time ago and I do not have the details but there was one thing that really called my attention. I have recreated the situation making up all the numbers (and it did take me a while of juggling numbers, believe me). The numbers are all made up and do not attempt to represent any real situation. It is how the numbers work out in the end that counts.

OK… the University of X was accused of sexism in their admissions because while 60% of male applicants were admitted, only 50% of female applicants were admitted. For a while there were the typical arguments and counterarguments thrown around. (It is sexism! No it’s not, we are impartial, maybe the women are not as well prepared… etc)

After a while it was decided to do some further analysis to see which of the schools was doing the discrimination (My hypothetical university has two schools, A and B). It turned out that individually both A and B were admitting a higher percentage of women than men! but when you aggregated the numbers the percentage of men was higher! Sounds impossible? here are a set of numbers to illustrate it:

School A:
male applicants 75, accepted 50 (67%)
female applicants 23, accepted 16 (70%)

School B:
male applicants 25, accepted 10 (40%)
female applicants 77, accepted 34 (44%)

In both schools females were accepted better than males and yet when you consolidate the figures you get:

male applicants 100, accepted 60 (60%)
female applicants 100, accepted 50 (50%)

Besides the obvious that you have to be VERY careful when using statistics to prove anything, what other conclusions can we draw from this example? (I will tell you later what conclusions I seem to remember from what I read, as I say it is a very vague memory)

Well, I posted this a couple of months ago and it did not get one single solitary reply. :frowning:

I thought I’d give it another chance by bumping it up again. (OTOH , with the board running at this speed maybe everybody just went to the beach…)

There’s an old saying that liars figure and figures lie.

This one is my recent favorite. I can’t remember the numbers exactly but I know the number 8 was involved so I’ll use that.

There is an antismoking advertisement on TV from an organization called “the truth”. I don’t know exactly who they are, but that’s beside the point. In a recent ad they quoted a statistic that went something this.

“The tobacco industry loses 8 customers ever second (or minute, or hour, like I said I can’t remember)”

Well couldn’t this same thing be said about the auto industry, the computer industry, Coke,etc?

The ad never mentions what the CAUSE of death is. It never says that these are SMOKING related deaths. What it’s saying is that a smoker dies every 8 seconds (mins, hrs,?) They could have died falling down the stairs for all the statistic shows.

Keeping the thread alive!!

-KAT

You do know that 64.3% of statistics are made up on the spot…


Yer pal,
Satan

I HAVE BEEN SMOKE-FREE FOR:
Three months, two days, 22 hours, 5 minutes and 29 seconds.
3756 cigarettes not smoked, saving $469.60.
Life saved: 1 week, 6 days, 1 hour, 0 minutes.

Only 80% of statistics have any base in truth. The other 37% are made up…

another good example.

A study show that up to 80 percent of all Americans suffer from Migraines. Does that mean that the other 20% enjoy them?

Cheap, I know, but I found it somewhat relevant.

In Martin Gardner’s aha! Gotcha, he mentions a case of this paradox arising at UC Berkeley in 1973 in graduate programs.

I vaguely remember a more recent case also at UC Berkeley, possibly in the 80’s.

zgystardst, that explains where I may have seen it long ago as I have read quite a few things by Gardner. I’ll try to find the original (I had to make up the numbers for this example).

Still, no one is answering the original question which is “what is the explanation of this paradox?”

anybody?

I’m sorry to say that there’s really not a very satisfying explanation of your paradox (which is a rather good one, I might add). It’s just a mathematical quirk that happens because School A, the school with higher admissions rates, has a much higher impact on the total admission rate than School B does. However, School B has just as big of an impact on the total number of applicants. Like I said, “That’s just how it works out” may not be very satisfying, but that’s what makes it a good paradox.

PS: Sorry I didn’t post to this two months ago - it really is a good thread.

well, there is an apparent contradiction but there also some conclusion to be gained by analysis. I guess the original question is (or was) are they discriminating in their admissions? In any case, what is happening?

The reason statistics can’t be trusted:

Three statisticians are out hunting rabbits. Suddenly a rabbit jumps out of the bushes just in front of them and starts hopping away.

The first statistician fires at the rabbit, and shoots over it.

The second statistician fires at the rabbit, and undershoots it.

The third statistician hollers out, “We got him!”

I think the OP demonstrates that statistics need to be taken with a grain of salt and always kept in context. Speaking from a communications perspective (my profession) I’d have to say that stats are usually more trouble than they’re worth. We use them to demonstrate what we want to say and our opponents use them to contradict what we’re saying. Often we’re using the same base but applying different interpretations.

And there’s the rub: we all appreciate that interpretations will differ yet we generally apply that to general situations and/or language. The arts and philosophies if you will. However stats are associated with mathematics - a science in which everything is black or white. Right or wrong. This contributes to statistics being given an unrealistic - and all to often undeserved - credibility.
Having said that - I toyed with the idea of becoming a statistician when I was younger. Oh the irony.

Well, I rather tired (it’s WAY too late for me), but off the top of my head, I’d say that the two populations should either NOT be combined (something like a heterogenity chi-squared test be done to show this?) or that the vast amount of variability in the combined score needs to be shown.

It’s like the old stats joke - half of me is in the freezer, the other half in the oven, but on the whole, I’m rather comfortable.

If you got a couple of bucks you can get HOW TO LIE WITH STATISTICS from amazon.com just the book ya need.

Hmm good thread. The problem is, in no small part, that people do not feel qualified (or interested) in challenging statistics or digging deeper. Obviously the average doper doesn’t suffer from this problem. :slight_smile:

Our local paper was trying to build some sort of scandal out of the fact that the average home school kid only scored on the 50th percentile on some test. Sigh.

I’ve got this book called “A statistician reads the newspaper” which, if I ever get the time to read it, may prove to be more of the same. I’ve seen “How To Lie with Statistics” and I recall it being good.

Anyway, if more people didn’t just swallow numbers as they read them, they wouldn’t have such a powerful potential to mislead.

–The former statistics TA

Cranky,

   Want to piss off a bunch of parents? Stand up at a PTA meeting and tell them half their kids are below average....

>> tell them half their kids are below average…

Not in the town of Lake Wobegone they’re not. Lake Wobegone, where all the women are strong, all the men are good looking and the children are above average.

Ok , since nobody is attempting any analysis if the OP and all we are getting is bad jokes about statistics, I will explain what I remember the conclusions were in the original case.

Reminder: the board saw that, as a whole, a higher percentage of women were being rejected and assumed the cause must be discrimination. So they asked for more detailed information, specific to each school, to see which one was the one discriminating. When they got the detailed information they were surprised to see that both schools were admitting a higher percentage of women than men. How can this be explained?

The explanation is this: School B is much tougher to get into than school A for everybody but more women are applying for B while more guys are going to A which is easier to get into.

The explanation is that guys had a clearer idea of their chances and applied where they had better chances while women were applying where they wanted to go without regard to whether they had any chances.

So the cause was found not to be discrimination at all but, rather, women’s unrealistic expectation.

I hope that explains it but I have a feeling people were much more interested in the jokes than in this conclusion…

No I was more interested in the conclusion - though I assumed you didn’t actually have the answer (sorry).

Are you sure however it was a case of men having more realistic expectations? That seems like a little hop of faith. Or did they somehow establish this with a bit of research. Which probably used stats… hmm…

:wink:

>> Are you sure however it was a case of men having more realistic expectations? That seems like a little hop of faith. Or did they somehow establish this with a bit of research. Which probably used stats… hmm…

Well, I am not sure how to interpret that. I mean the men were applying to the schools where they had a better chance while the women were applying to schools where they had a smaller chance… That is the fact… are you asking about their motivations? I don’t know them and I am not sure we need them… The fact is women were applying to schools where it was much more difficult to get in for everybody and that is the cause of them being rejected on the whole more than men, even though the individual colleges rejected them less. So it was not the schools discriminating against them, it was their choices that put them in that position.

I guess the next step would be to find out the cause of this (are they overoptimistic, unrealistic, misinformed, …?) and try to correct it.

But the fact is that you have to know how to interpret statistics and you cannot just say group X is underrepresented somewhere and it must be discrimination.