Lets discuss Bayesian Statistics

“Bayesian statistics is the way that we combine these two pieces of data to find the true probability that the person has AIDS. Here’s how it works; suppose you apply the test to a million people, only one of whom has AIDS. Because your test is 99% accurate, you’ll pick up 10,000 false positives, but you’ll probably also pick up the guy who really has AIDS. So you’ll have 10,001 positive results, one of which is correct; therefore, the probability that the person has AIDS, given that they have a positive AIDS test, and were picked randomly from the population of people with a one-in-a-million chance of having AIDS, is 1 in 10,000. (all numbers rounded, so don’t get on my case)”

I see a problem with the above. There is a jump there at the end that doesn’t compute. The 1 in 10,000 is the chance that you would randomly pick one of the positive results in that scenario, and he has AIDS. But that is not the chance that the random one person in that population that you administered the 99% accurate test to and got a positive result has AIDS. That chance is still around 99%.

The two sampling situations are completely different. In one you randomly pick someone from a million people and administer the test. In the other, you administer the test a million times, and randomly pick from positives. How in the world can you equate the two for the probability result?

The 99% test efficacy is not dependent in any way on the prevalence of AIDS in the population. I really don’t see any logic in your post that ties the two.

And if you randomly pick 100 of the people who tested positive for AIDs, you’d assume that 99 of them actually do have AIDs? Even though God came down and told you that only 1 person does?

And this is going beyond Bayes Theorem. If I understand correctly, Bayes Theorem is giving you relationships between probabilities. But it isn’t giving you the probabilities to start with, nor a way to calculate them on the basis of statistical results; both the method you describe, and the one given by Left Hand of Dorkness, are consistent with Bayes Theorem. (I’m not intending to contradict anything you have so far said.)

Consider this table:


              | Positive test | Negative test |      TOTAL     | 
--------------+---------------+---------------+----------------+
              |    .99*1      |    .01*1      |                |
HIV positive  |       =       |       =       |       1        |
              |       1       |       0       |                |
--------------+---------------+---------------+----------------+
              |  .01*999,999  |  .99*999,999  |                |
HIV negative  |       =       |       =       |    999,999     |
              |     10,000    |    989,999    |                |
--------------+---------------+---------------+----------------+
              |               |               |                |
TOTAL         |     10,001    |    989,999    |   1,000,000    |
              |               |               |                |
--------------+---------------+---------------+----------------+

The question is: Knowing only that the person has a positive test result, what can you say about that person’s probability that they are in fact HIV positive.

Knowing the limitations of the test, we can expect about 10,001 of the population of one million to get a positive result. Knowing the nature of the disease, we can expect one person in one million to actually have HIV.

Before knowing a person’s test result, we are entitled only to surmise that the person has a one-in-one-million likelihood of being HIV positive. Knowing that the person got a positive result on an imperfect test allows us increase that likelihood to 1 in 10,001.

There are ontological issues with Bayes Theorem, namely, How can our knowledge about the world, a purely mental state, affect the probabilities of events in the actual world outside of our minds? Obviously in some sense, it cannot. Of course, probabilities themselves are weird from an ontological standpoint: either the person has HIV before the test, after the potentially false or true positive, and once it has been unequivocally confirmed. There is no such thing as 74% HIV positive.

But probability is also not purely epistemological. To say that there is a 1/6 chance of rolling a five on a fairly-weighted die is a true statement about the die, not our mental states.

This is the locus of a lot of mystery w/r/t Bayes Theorem.

Evil Economist covered it right at the beginning. Probabilities don’t have anything to do with actuality unless the probability hits 0 or 1. Everything in between is useless in determining if something actually happened. Determining in a scientific manner anyway. If you’re going to apply the ‘reasonable doubt’ standard, they aren’t really a factor either because the people using them to make a decision may just misapply them.

So aside from statistical studies and parlor games, what would be the point in bringing the subject up? In most discussions these things come down to make determinations of the ‘more likely than not’ type, and those generally fail based on a lack of information.

Huh? God didn’t. He came down and told me that one in a million randomly picked people do. But those who tested positive are not randomly picked.

I don’t think so, but maybe I’m too invested in this to get it, so help me, please.

Your detective example doesn’t include the additional thing I said: *
But is there any actual predictive value in what I’m saying? No, of course not. Because all other things are NEVER equal. They cannot be. That’s why it’s a purely theoretical statement.*

You add:

How can that happen, when I’ve said up front that it’s irrelevant?

Here is where you went astray. You are importing a fact about the world (that the likelihood of taking an HIV test correlates to self-assessments of exposure) that is not, and is not meant to be, incorporated in our model. You may presume that in this land of one million, all people are tested for HIV.

I disagree. Knowing that the person got a positive result on an imperfect test allows us to increase the likelihood to 99 in 100. The prevalence of HIV in the population has no influence, whatsoever, on the efficacy of the test. Whether it is one in a million, or one in a thousand, after the test it is still 99 in 100.

Would a judge allow this in court? For the prosecution to throw out a factually correct but absolutely pointless statistic as long as he says up front that it’s irrelevant?

See, it’s this kind of statement that makes me want to point how it’s not technically true. I get that the gist is true, but let’s face it: there’s a non-zero chance that I have a previously unknown to me identical twin, and it was he who robbed the bank, cut his finger, and left blood with DNA that matches mine at the crime scene.

But I’m sorry – I don’t agree that the because the probability is not 0, we can regard the probability of that event as useless in determining if it actually happened.

No I didn’t. I am saying that picking only those who tested positive is not a “random” pick. Which is self-evident.

Not right up front, no, because in court there’s a rule that says to be admissible, evidence must be relevant.

However, I was not the originator of the discussion. Someone else – Huerta88, I think – first made the comment, and was told it was wrong, and I chimed in to say he was technically correct, although ultimately meaningless in the real world.

And that IS something that would be allowed in testimony: rehabilitating a witness by pointing out that what he said was not false.

I see what you’re saying, and an analogy might help.

I’ve got a random number generator that will generate a number between one and a million. It’s attached to a door. If it generates 1,000,000, the computer will rotate rooms such that there’s a brand new Ferrari behind the door. Otherwise, it rotates rooms such that there’s a hungry tiger behind the door. Got it? One in a million times=good; the rest of the time=bad.

But things aren’t so bad! I’ve got another number generator that will generate a number between 1 and 100. If it generates 1-99, it displays a sign telling you what’s behind the door; if it generates 100, it’ll display a sign telling you the exact opposite of the truth.

You look at the door. It says, “Ferrari!”

Do you open it?

I wouldn’t. Because the chance that the first random number generator rolled 1,000,000 is much lower than the chance that the second one rolled 100.

I think the probability here works exactly the same as it does in the 1-in-a-million people with AIDS, 1-in-100 people get bad test results example.

My apologies then – I made the same mistake the Evil Economist did, misremembering you as the originator of this factoid instead of just its defender. And the phrase “rehabilitating the witness” is very interesting to me.

I kinda see what you’re saying…

The way to explain it better, just say that there is no Ferrari. Only hungry tigers. If you know THAT, and the sign says “Ferrari” - don’t open the door.

This is a zupes confusing example. Here’s one that might help make intuitive sense.

While taking a helicopter flight over some atolls in the Pacific, an emergency causes you to eject directly over the midpoint between two small islands: Cannibal Island and Vegetarian Island. Both islands look pretty similar (you can’t tell the difference) and regrettably, you spun around mightily as you were plummenting to the sea. Thus, there you are, square in the middle, not knowing which is which, and you have to get to shore.

You also know this: On Cannibal Island, 90% of the inhabitants wear bones in their noses. This is not the fashion on Vegetarian Island, where only 8% of the inhabitants do.

You come to shore and see a native watching you with a bone in his nose. How likely is it that you’re fucked? 50%? (You had a half-half chance when you decided to which of the two islands you were going to swim, recall.) Or is it closer to 92%?

It is the latter, of course. (Well, a case could be made either way. But I think intuitively Bayes Theorem does capture something we might suspect should we find ourselves in these circumstances. Epistemology, ontology, and probability theory would not be our most pressing concern though.)

Bayes Theorem can be expressed thus:


                    p(B|C)*p(C)
p (C|B) =  -----------------------------
            p(B|C)*p(C) + p(B|~C)*p(~C)

Where B = the probability that the first native you see has a bone in his nose and C = the probability you decided to swim to what ended up being Cannibal Island. The tilde is the NOT operator.

p(C) and p(~C) are both 50%
p(B|C) = 90%
p(B|~C) = 8%

This gives 0.45/(0.45+0.04) or 45/49 = 91.84%

Definitely doesn’t help me! For one thing, you’re trying to solve the probability of B|C, but since you already know the probability of B=1 in your example, that’s not necessary, I don’t think. For another, introducing a whole different set of numbers from the OP’s makes it much more confusing. But maybe it’ll help someone else.

I didn’t say it was useless in determining whether it happened. But you are assuming more than the results of the test tell you. You have to factor in the non-0 probability that the test results are incorrect, and evidence that you had the means and opportunity to commit the crime. And then when it comes to a jury, they may simply work off a basis of ‘more likely than not’ in determining a lack of reasonable doubt.

I will grant that in some cases there is a non-0 or non-1 probability that can be used to make a reasonable deduction, but they would have to be effectively 0 or 1 probabilities, and include all other information that could affect the determination.

Sorry for not responding much in my own thread, but I’m on my phone, which makes reading and posting difficult. I’ll post more in the evening.