The thread that inspired this one is Bricker’s question in the “Why hasn’t the Neighborhood Watch shooter been arrested?”, where we have a discussion of what essentially amounts to Bayesian Statistics.
This is an older discussion/argument, which essentially amounts to answering the following question: if black women are almost never raped by white men, is a black woman who claims to have been raped by a white man more likely to be a liar?
This is essentially a Bayesian statistics question, which I’d like to address in this thread. And, given that this is the Dope, I imagine that I’ll get corrected a few times, and learn something in the process myself, so please chime in.
To understand Bayesian statistics, first consider the following question, (which must be the most commonly asked trick question in statistics):
Suppose that in the country of Statistica one in a million people has AIDS. You know this is true because God parted the heavens and told you so. You have an AIDS test which is 99% accurate (God told you this also). You pick a random Statistica citizen and administer the test. It comes back positive. What is the probability that this person has AIDS?
Your typical person will think for a second and conclude that if the AIDS test is 99% accurate, and came back positive, there is a 99% chance that the person has AIDS.
Sounds good, right? Well, this is a trick question. You actually have two pieces of data which you have to consider; you know that the test is 99% accurate, but you also know that only one in a million people has AIDS. Suppose that instead of administering the test, you just assumed that the tested person didn’t have AIDS. You would be right 99.9999% of the time; your guess would be more accurate than the AIDS test.
So, is the chance that the person has AIDS 0.00001%? Well, no, because you also have a test that says he does have AIDS, and you have to consider that also.
Bayesian statistics is the way that we combine these two pieces of data to find the true probability that the person has AIDS. Here’s how it works; suppose you apply the test to a million people, only one of whom has AIDS. Because your test is 99% accurate, you’ll pick up 10,000 false positives, but you’ll probably also pick up the guy who really has AIDS. So you’ll have 10,001 positive results, one of which is correct; therefore, the probability that the person has AIDS, given that they have a positive AIDS test, and were picked randomly from the population of people with a one-in-a-million chance of having AIDS, is 1 in 10,000. (all numbers rounded, so don’t get on my case)
OK, so now that you understand the concept of Bayesian statistics, consider the following modified question
Suppose that in the country of Statistica one in a million people has AIDS. You know this is true because God parted the heavens and told you so. You have an AIDS test which is 99% accurate (God told you this also). You pick a random Statistica citizen and administer the test. The random person you picked is a male prostitute who happens to also be an intravenous drug user who shares needles, and who doesn’t use those wax pieces of paper to cover the seat when he uses public toilets. The test comes back positive. What is the probability that this person has AIDS?
OK, let’s do the math. …carry the one, multiply by six…OK, the chance this person has AIDS is 1 in 10,000, right? If this answer sounds wrong to you, and you can figure out why, then you know the problem with applying statistics to the rape question. You also know the general problem with applying Bayesian statistics to social sciences questions, and why so many people make so many mistakes trying to do it.
In fact by naively applying your Bayesian statistics math in this case you’re moving away from the right answer. You’d have been better off never having heard of Bayesian statistics, because by trying to apply it here you’re making your answer more wrong than the 99% guy in the first question.
Let’s suppose that all the claims in the various thread are true: suppose black women are only raped by white men 1% of the time. Suppose that 8% of rape allegations are false. Can we combine these statistics to determine if a particular black woman who claims to have been raped by a white man is lying?
No, we can’t. Suppose that the 1% figure is true. It’s likely that it’s true because black women are rarely alone with white men, so it’s a statistic that doesn’t apply to a black woman alone with a house full of white men, and if we try to start calculating Bayesian statics using this number, our calculations will be more wrong than if we ignored it.
Suppose that the 8% figure is true. It’s likely true because, I don’t know, some women are trying to get back at ex-boyfriends who angered them. Does this number apply to women who claim to have been raped at a frat party? Almost certainly not, and by using this statistic in your calculations, you’ll be more wrong than if you ignored it.
Now, that’s not to say that we can’t apply Bayesian statistics in this case, we just have to use the right data. We need to know the percentage of black strippers who are raped while alone in a house with drunk frat boys, and we need to know the percentage of women who claim they were raped during a drunken frat party who are lying, and then maybe we can start making some estimates. But if we start applying population statistics to the wrong population, then we will get the wrong answer. And it won’t just be wrong, it’ll probably be more wrong that if we just ignored the statistics entirely.