Why is the answer to "the most annoying math puzzle" not 8?

Double posting to specifically highlight this. The standard approach, entraining us all to consider sensitivity as THE number that matters, does NOT inform accurately. It exploits and often misdirects. Headlines that got readers engaged in resisting the intuition, the tendency to go with “eight”, that do not exploit the cognitive illusion, would inform about public health more accurately.

I’d prefer that neither is used as a sensationalist headline as I’ve already clearly stated.
There is no need to state figures that you know will be misinterpreted by an audience who are unlikely to dig deeper to uncover the detail.

If I suspected I had cancer and there was a test that could be done and I asked you how likely was the test to pick up cancer if I had it, you would not say 7.5%. Or at least I hope you would not.
But that is precisely the scary motive behind the construction of the headline as I gave it.

More correctly you mean that quoting the sensitivity number only does not inform fully. I agree, Nor does any simplistic headline, on these matters they are so often constructed so as to purposefully exploit and misdirect.

The sensational sells media and a sober consideration of the risks and benefits of a complex medical intervention does not.

One generally is, as the norm, the one that emphasizes sensitivity as the number to care about, that sensationalizes the risk of missed cases and generally ignores false positive rates and their real consequences.

Headlines are what they are. A short teaser to get people to read the article. Hoping for them to be something else is frankly silly.

Now you are asking about a diagnostic test. A different question and I actually would deflect that question with first finding out why you suspect it. Are there symptoms or signs that are worrisome? Is there a family history that is relevant? What numbers I respond with (assuming I know them) and how I present them depends on the responses. If you are in fact at low risk for the cancer and testing has more potential risks/costs than benefits, framing with those numbers highlighted is how I would respond. I hope that doesn’t disappoint you.

No. That is not more correctly what I mean. I mean that it misinforms by strongly reinforcing the cognitive illusion, in a way the alternative headline does not.

The actual numbers on breast cancer screening are not as in that question, but the impact of the cognitive illusion on public reaction to guidelines that recommend against screening 40 to 50 year old women without specific risk factors has been very real, and made those guidelines very “controversial”.

Not in the slightest, it is exactly the right response. We do not disagree.

Giving the answer “7.5%” and then walking away would not be a good response at all. But that is far too often the practical effect of sensationalist media headlines and poor quality mainstream media articles.

Then I don’t think we are going to agree on this. The headline that says

Is absolutely playing into that that cognitive illusion and relying on people misunderstanding.

I apologize in advance for this sounding snarky, but I really cannot think of another way to say it.

If you think that the 7.5% number plays into, rather than against, that Bayesian based cognitive illusion, then you really do not comprehend what the illusion is

Are we perhaps talking about different cognitive errors?

When people are presented with the words “accuracy” and “only 7.5%” in a headline there is a high likelihood of a certain assumption being made and an incorrect conclusion drawn. Very much akin to the original junction problem with “equivalent” and (2000/16 1000/?)

The headline makers rely on that in order to create a sensational headline whilst still being plausibly able to claim reporting accuracy.

Their thought process is not “this scientifically defensible figure will be correctly interpreted by our readers as referring to the output of multiple interacting factors that we will then go on to explain in detail”

I’m talking about the Bayesian based one illustrated by the breast cancer screening question, and the endemic and pernicious impact of that cognitive illusion.

This impact is something I have experienced constantly.

Back in training, professors, very intelligent, math-reading-science literate, who spent most of their clinical time dealing with one very finite end of a big filter, seeing many cases of what is in the general population very rare, berating students and interns for not ordering tests to “rule out” various zebras with “you can’t find it if you don’t look for it” with no discussion and honestly understanding themselves, of the huge probability of false positives resulting, and the harms incurred as we chased down those rabbit holes needlessly. (For them it was both the Bayesian based illusion and the representation illusion that underlies the “Linda Problem”.)

When supervising others (well educated highly literate in math, reading, and science) explaining why knee jerk panel testing was not the best tactic, and how to use tests selectively. Even from the POV of “defensive medicine”.

Most days dealing with worried well parents wanting testing (or treatment) of their child “just in case” or thinking that “it couldn’t hurt”, “just to be sure”, and the challenge of explaining why more testing and treatment is not better care in this case (it is much faster just to order the test and prescribe the meds, but it is bad care).

Many days explaining about the results of newborn screening tests to scared parents, and how the tests are aimed at not missing any of the conditions but that most abnormal results on follow up will be okay, no need to panic.

That’s the one I’m discussing.

I think we’ve been talking at cross-purposes then. I used the output of that question to make a point about sensationalist and misleading headlines that appeal to another deficiency of human cognitive behaviour, a far less technical one.

The wider nuances of Bayesian errors are not necessarily relevant to what I was talking about. No wonder there was some confusion. My error, perhaps I should’ve stayed clear of that example completely (and it’s not like there aren’t myriad examples of other equally misleading headlines regarding stats in the realm of healthcare)

The idea that you were always only on a crusade against generically misleading headlines is disingenuous. When you saw the positive predictive value statistic (the 7.5% figure), which apparently you were unfamiliar with, you jumped on it as though it were inherently misleading, implying that it had a much greater potential to generate misleading headlines than the sensitivity statistic (80%). And you have continued to explicitly defend that position, claiming that hypothetical headlines that state the sensitivity statistic are “less wrong” than hypothetical headlines that state the positive predictive value.

Let’s forget about headlines for a moment.

Do you understand and accept why, in the case of a low-cost cancer screening, the positive predictive value (7.5%, the proportion of true positives among all test positives) is usually the single relevant statistic that allows us to think rationally about the cost-benefit of the screening test? And that the sensitivity (80%, the proportion of true positives when a patient has the disease) is only relevant as one of the inputs to calculate the positive predictive value?

Do you understand why, in the example of the COVID test that @DSeid described, the sensitivity is still only relevant as one of the inputs to the calculation of the important single statistic, in this case the negative predictive value (the proportion of true negatives to all negatives), which is what tells us the probability that we may infect grandma after a negative test?

Or once they focus on some particular quirk of wording or an unexpected number.

Meh then. That’s the nature of headlines. In this specific case though it would be a headline that would get noticed and get people to perhaps read the article, but it really is not sensationalist at all, no more than any headline that informs readers about a fact that surprises them.

Women being screened SHOULD know that at least 4 out of 5 times, and possibly as much as 19 out of 20 times, a positive mammography result will be wrong, more for those under 50 than those over. Guideline creators have to look at not only whether a screen reduces mortality by some statistically measurable amount, but what the potential harms are to achieve that, and come up with some way to value those potential harms against the potential benefit, and present that information clearly. (My WAG is that such will not negatively impact screening but may help individual be less scared when they get a positive test result, understanding better that the odds still are that they are cancer free.) No several word headline is going to do that, so every headline will be incomplete.

No, you are completely wrong, and I’d already bemoaned bad headlines and statistical reporting way back in the thread, you can go and check if you like.

I am absolutely familiar with that result of those calculations and that sort of slightly counter-intuitive conclusions that result. The actual figure was of no surprise to me at all.

As I said right at the top of the thread I’ve listened to Tim Harford and similar podcast for many, many years. I also recommend Ben Goldacre’s “Bad Science” as further reading on precisely these sort of issues and who, having worked in the mainstream media, knows exactly how statistics like the “7.5%” are misused and why.

What I’m not good at is doing the calculations which I why I asked someone to check my reasoning as I went but the result itself? Not a surprise at all.

Let’s not, seeing as misleading headlines about healthcare and the cynical use thereof was the entirity of my point.

You keep repeating ad nauseum the more detailed explanation of why the 7.5% is significant. I agreed the first time, I knew that before you even mentioned it, I even listed out the calculation longhand which shows the scope and scale of the issue. The impact of the false positives is right there in black and white. When I criticised the hypothetical headline I did so whilst stating it was a valid figure from a certain perspective.

The 7.5% is not a mystery to me, the 7.5% is important. The 7.5% contains a whole story that bears full explantion. The 7.5.% was calculated by me. The 7.5% has truth. I know what it means and what it represents.
However the 7.5% (and figures like it) are also often jumped upon by the popular press for the purpose of misleading headlines. And that was the point I was making.

If you continue to think that I don’t understand what the 7.5% represents, even though I calculated the sodding thing in the first place, then you’ll not be able to accept the more general point that I was making.

And here is the nub of the issue.
None of the headlines suggested in this thread have said anything about “sensitivity” or “positive predictive values” have they? They did not use that specific and clearly defined terminology did they? Terminology that the general public are unlikely to predictably misinterpret in the way that they would for “accuracy”

I did not read the entire thread, but from what I did read in the first 25 posts or so, I was surprised no one has broken this down algebraicly, because I think it makes it a lot simpler.

We are trying to minimize D, total damage caused. But the problem asks us what values would make the two intersections the same, so we can use the same value D for both intersections.

We need to assign a weight to major accidents vs minor accidents in order to calculate the damage. We don’t have this information from the description, but there is one important assumption we can make, and that is that major has more “weight” than minor, so that gives us the following:

2000x + 16y = D
1000x + Ny = D
x > y

Hopefully this makes it clear that 8 doesn’t work at all. I could see how someone could think 32 would be a valid answer, but that would be ignoring that x is supposed to be the “major” and y the “minor”. Most likely, the “weight” for a major accident is quite a bit higher than for minor, but we could safely assume that a minimal difference between the 2 weights would be zero… in other words, what if the law treated them the exact same? That gives us:

2016x = D
(1000 + N)x = D

Hence, this leads us to “1016” as the absolute minimum number of minor accidents that could possibly make these two intersections equivalent.

Perhaps the better question is, what would be a good headline, for a new cancer test? If a new test were developed that had both a higher sensitivity and a higher specificity than the old test, then I think it would be perfectly fine to have a headline that says “New cancer test is more accurate than old tests”.

But then, what if we have a new test that has much, much greater specificity than the old test, but a slightly lower sensitivity, or vice-versa? Now, it’s not completely correct to say that it’s “more accurate than the old test”, because there are different things that could be meant by “accuracy”.

But of course, in the real world, this is usually not a difficult problem. Because most tests aren’t actually completely binary, but rather, give a continuum of outcomes, which are then compared to some threshold to give the binary answer. A test might, for example, measure the number of milligrams per liter of some blood protein, such that someone with less than 2 mg/L is 99% likely not to have cancer, and someone with more than 20 mg/L is 99% likely to have cancer. You could use as a test with 99% sensitivity, by setting the threshold at 2 mg/L, or as a test with 99% specificity, by setting the threshold at 20 mg/L, even though it’s actually the same test. Or you could set any threshold in between (and maybe, if you set the threshold at 6 or 7, you get 98% on both sensitivity and specificity).

Which means that if you’ve got a new test that has much higher sensitivity but slightly lower specificity, or vice versa, you likely can turn it into a test that’s better on both counts, just by adjusting your threshold slightly.

Alternately, you could just stop giving binary answers. Test a patient’s blood, and don’t tell them “The test is positive”, nor “the test is negative”, and instead just give them the probability. And to spare the public from their own ignorance of probability, you could give them the Boolean-adjusted probability to begin with.

In some cases there is a straightforward answer to this, inasmuch as that there is only one sense of “accuracy” that we should care about. Assume the two possible tests are approximately the same low price and the test procedure itself carries negligible risk. In the case of a screening, negative tests lead to no further action, and therefore negligible cost or benefit to the patient. We should therefore ignore both false negatives and true negatives and consider only the cost or benefit subsequent to positive test results. The relevant statistic is the proportion of true positives among all positives - the positive predictive value. The better test is the one with the better positive predictive value.

This is, of course, a function of the prevalence of the disease as well as the properties of the test, and will vary among populations. So if we don’t want to use the technical term, we could say something like “this screening test is more accurate for this population”.