This is a poll of your intuitive response to the given scenario. Don’t do any calculations in your head, and don’t read the thread before deciding on an answer – just say what you’d guess the answer to be if you only had a few seconds to think. If you’ve come across similar questions in the past and are thus able to give a more accurate guess, please mention that.
You go to the hospital and are randomly screened for a serious but rare disease with which one person in a thousand is infected. The test for the disease has a false positive rate of 5% and false negative rate of 1%. That is, if you’ve got the disease, the test will come out ‘positive’ 99% of the time, while if you don’t have the disease, the test will come out ‘negative’ 95% of the time.
Your test comes back positive – what is the likelihood that you actually have the disease?
Probably somewhere around 25%, just based on the answers I’ve gotten for this problem before. Those usually use a 1% disease rate, though, so this might be lower.
I cheated and did the calculations. For those who are curious, the probability is in the spoiler box below.
The probability is roughly .02. This is a standard homework problem from an introductory probability class involving something known as Bayes’ theorem.
I’d read this in one of John Allen Paulos’s books; I think it was A Mathematician Reads the Newspaper. I remembered the results were surprising, so I got out the calculator:
[spoiler]Take a random group of 1,000,000 people. 1,000 of them are infected with the disease.
Of those 1,000, 99%, or 990, will get a positive result.
Of the 999,000 who are NOT infected, 5%, or 49,950, will get a false positive result.
So 50,940 people will get positive results. Of those, 49,950 will NOT have the disease.
If you get a positive result, the odds are thus 98% that you do NOT have the disease. Weird.
Unless my math is wrong, in which case my only excuse is having been an English major.[/spoiler]
Ah. I gave my “intuitive response” rather than attempting to think through the statistical probability. However, this highlights one of the problems in using statistics in relation to a single individual: I’m just a speck in the datapool from an objective perspective, but when you ask me what I intuit *my own *position in a data set, I’ve got all sorts of information that makes me non-random in terms of my own assessment of my reisk (which in this sample you’d call “error variance”), such as a knowledge of my risk factors for exposure to or development of the disease. I may have a statistically low chance of having the disease in your sample, but that chance is a function of my place in a random sample, and would differ if your pool weren’t random (e.g., if you screened out everyone who hadn’t eaten disease-bearing candy within the week, and only tested those of us who had).
On top of that, I imagine that our ability to intuit our own odds depends in part on our relative pessimism, versus our ability to intuit the general odds for a random person unknown to us.
This is very similar to a question given as homework in my Economics and Business Statistics class (we’re on the ‘Probability’ chapter). The numbers are different, but the question is identical.
The instructor did the calculations for your question as an example, and then (using the same numbers) posed two more to us for homework: What proportion of all tests are positive? and If the test result is negative, what is the probability you do not have the disease?
Putting it in probability terms …
P(D) represents the probability of having the disease. P(D’) represents the probability of not having the disease.
P(T) represents the probability of a positive test. P(T’) represents the probability of a negative test.
You’re asking for P(D|T), or the probability of having the disease, given a positive test result. My teacher’s additional questions are asking for 1. P(T|D) + P(T|D’), or the probability of a positive test given the person has the disease plus the probability of a positive test given the person does not have the disease, and 2. P(D’|T’), or the probability of not having the disease given a negative test result.
I just copied the problem from Wikipedia’s article on Bayes’ Theorem. I wouldn’t be able to use Bayes’ Theorem off the top of my head, but I ran across a problem like this in college and thought it was interesting. And, of course, most people will get the question very wrong, giving an answer in the 90s. Shoshana: And that, presumably, is why they don’t do random screening for very rare diseases: the flood of false positives would obscure the true positives.
jackelope: Heh, nice common sense solution. I don’t think it would have occurred to me to solve the problem like that.
I think I’ve heard a variation of this problem as a Car Talk Puzzler (or something like that) and so suspected that the answer was that it is a lot less likely that you have the disease than it seems like it should be. I mean, positive test result means you have the disease, doesn’t it?
There are four possible combinations of Disease and Test:
DandT ~ Disease Present and Screen Shows Positive
DandT* ~ Disease Present and Screen Shows Negative
DandT ~ Disease Absent and Screen Shows Positive
DandT* ~ Disease Absent and Screen Shows Negative
Pr{D|T} ~ Probability that a subject has the disease, given a positive test
Pr{D*|T} ~ Probability that a subject lacks the disease, given a positive test
Pr{D*|T*} ~ Probability that a subject lacks the disease, given a negative test
Pr{D|T*} ~ Probability that a subject has the disease, given a negative test
What we have is that
So, Pr{T|D*} = “False Positive” = .05, so then Pr{T*|D*} = .95
And Pr{T*|D} = “False Negative” = .01, so then Pr{T|D} = .99
What we seek is Pr{D|T}, the probability that disease is present given a positive screen.
This is a very nice explanation as to why screening tests can be controversial, even though everyone agees that catching the disease you are screening against is a good thing.
For a screening test to be of much value, 99% sensitivity and specificity is pretty mucha given.
We need the prevalence of disease (Pr{D}), Pr{T*|D*} = 1 - Pr{T|D*}, where Pr{T|D*} is the false-positive rate, and Pr{T*|D}, the false negative rate. Recall that Pr{D*) = 1 - Pr{D}.
Suppose that we are given that Pr{T|D*} = “False Positive” = .05, Pr{T*|D} = “False Negative” = .01 and Pr{D} = .001. Then Pr{T|D*} = “False Positive” = .05, so then Pr{T*|D*} = .95 and Pr{T*|D} = “False Negative” = .01, so then Pr{T|D} = .99
Plugging in what we know:
Pr{D*|T*} = (.95)(.999)/( (.95)(.999) + (.01)*(.001)) = .9999+. So in this case, approximately 99.99% of the negative tests actually indicate absence of disease.
What they’re asking for here is called positive predictive value. While sensitivity and specificity (the chances of someone with and without the disease to have a positive or negative test, respectively) are inherent characteristics of a test, PPV and NPV depend on the prevalence of the disease. So an HIV test has the same sensitivity and specificity wherever you are, but its PPV and NPV change depending on the population.
Not really. A screening test needs to be positive in everyone with the disease, so ideally sensitivity is up around 100%. It would be nice if it were also 100% specific, but it doesn’t really have to be–that’s what the confirmatory tests are for.
For instance, lots of women without cervical cancer have abnormal Pap smears, and lots of women without breast cancer have abnormal mammograms. (In fact, neither of those is close to 99% sensitivity, either, but you get the point.)
Depends on how the test works, not just its probabilities. Many quick tests look for antibodies (which show that you’ve been in contact with the vector) and not for the vector that produces the antibodies.
40% of my 8th grade classmates tested positive for tuberculine; none had it, but the actual tuberculosis test isn’t the tuberculine… it’s the chest XRays.
Despite the fact that I research this kind of stuff for a living, I would for sure be freaking out and convinced I had it. That’s just my personality, which I know is not grounded in any actual reality.