Question about estimating probabilities

Gordon_G · October 26, 2024, 11:04am

My question appears mathematical but I post in IMHO rather than FQ because I don’t think there’s a single clear answer. I am looking for a heuristic, an arithmetic adjustment not guaranteed to be valid but which might be reasonable in the absence of more information.

Let Prob₁(X) = p₁ be Detective #1’s estimate that X is true.
Sometimes it’s more convenient to use
Odds₁(X) = p₁ / (1 - p₁)
For one thing, you can multiply an Odds by any positive number and get a valid result; but multiplying a probability of .3 by 5 spells trouble!

Although he is presented with the same evidence, suppose Detective #2 has understandings different from #1. His estimate is Odds₂(X) = p₂ / (1 - p₂)

Now suppose some new evidence – call it J – suddenly appears. Assume (a) that J is unexpected or surprising, and (b) that J strongly supports conclusion X. For example, Detective #1 might adjust his estimate via
Odds₁(X|J) = 4 ⋅ Odds₁(X)
For example, if his old estimate of Prob(X) was 50% (Odds = 1:1), his estimate after learning J would be 80% (Odds = 4:1).

Detective #2 might have started with a much lower estimate of Prob(X), say 20% instead of 50%. Can we guess what his new estimate will be after learning J?

Some will answer “No. We know nothing about the differing models and understandings of these two detectives. For all we know, Detective #2 will consider the new evidence J to be irrelevant, and his estimate will remain at 20%.”

Yes, I understand that. But I wonder if there is a heuristic that might be a “best guess” in the absence of more specific information.

I once thought treating the Odds multiplier as common to both detectives might be a good heuristic. E.g. given Odds₁(X|J) = 4 ⋅ Odds₁(X), a good guess might be
Odds₂(X|J) = 4 ⋅ Odds₂(X)
But I’ve forgotten why that approach seemed so logical to me! Is there a better way to model the detectives’ opinions?

Does anyone understand what I’m looking for? Any advice?

Chronos · October 26, 2024, 12:43pm

If it helps, the topic you’re looking for, on how prior probabilities change given new information, is “Bayesian statistics”.

DPRK · October 26, 2024, 1:34pm

The so-called Bayes’ Theorem literally describes the conditional probabilities as

\displaystyle P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}

if you interpret probability here as degree of belief (so your different detectives might have different prior P’s.

Voyager · October 26, 2024, 5:39pm

If you are going with probability, I agree that Bayes Theorem is the way to go.
But what is actually happening here is that the detective is using probability as a stand in for their level of certainty. To compute a real probability you need to know all possible outcomes, like the six possibilities for a die roll.
I blame Mr. Spock and R2D2 for this common confusion.

Thudlow_Boink · October 26, 2024, 5:59pm

That is one approach to probability, but not the only one.

@Chronos mentioned Bayesian statistics. To quote the Wikipedia page:

markn_1 · October 26, 2024, 7:45pm

It can sometimes be tricky to determine whether evidence supports a conclusion or not. Suppose X is “no human is over 12 feet tall”. The new evidence J is “we have found a human who is 11.9 feet tall”. Does J support X? Strictly speaking, J is consistent with X and therefore supports it, but practically speaking, J would greatly reduce one’s confidence in X.

DPRK · October 26, 2024, 7:54pm

The famous example is, suppose that the hypothesis is “all crows are black”. You observe a green apple. Does that support the conclusion, or not?

Voyager · October 27, 2024, 5:36am

I also mentioned Bayes Theorem in my response. I guess there are some aspects of statistics where this might be useful, but given measured human irrationality about the probability of events, I’d be pretty nervous about calling it a probability in any useful sense. I’d wonder how you would do math on this. If twice the number of people now think the earth is flat, is the probability of it being flat doubled?
In any case the examples I used were supposed to show a mathematical measure of probabilities. Spock never said “I think the odds of our surviving are 2.37%” (Unwarranted numbers after the decimal point intentional.)

Gordon_G · October 27, 2024, 5:42am

I do understand and agree with this. But none of the responders has understood MY question. The situation is that NONE of the requisite conditional probabilities are known or available. ALL we know is that different detectives estimated different probabilities, and that both are suddenly presented with strong new evidence. (The evidence’s influence on Prob(X) is somehow quantified as ‘j_weight’.) I seek a HEURISTIC function of the form
new_prob = f(old_prob, j_weight)
which is our best chance to yield APPROXIMATE answers. And I’ve provided my tentative guess (for a suitable form of J_weight):

Odds_new = J_weight * Odds_old

To be less unclear ( ) we might deduce J_weight by simply asking one of the detectives for his guesses at Odds_old and Odds_new. Our hope is that the heuristic equation we come up with will let us then GUESS APPROXIMATELY the Odds_new value for the other detective. (Assume that that detective has passed away, and cannot be asked directly.)

I had ALREADY attempted to apply Bayesian methods, but was unable to see how to do that with the MEAGER information provided. I hope one of those advocating for Bayes will show me the specific way to the heuristic function I seek. I accept that the result will not be a “real probability” but will be, at best, a good guess based on meager information.

Perhaps the best we can do is to answer: “Unknown and unknowable.” I’ll accept that, but it seems to me a “best possible guess” might exist.

Mongoose3 · October 27, 2024, 7:46am

I believe you already answered your own question.

Gordon_G · October 27, 2024, 10:07am

I told myself I’d go away quietly after my latest post. But I’ve thought of a specific example that might give the question more credibility.

Consider the following question related to the Drake Equation. What is the probability that there is at least one other planet in our galaxy with advanced life? Suppose two “detectives” offer the guesstimates 0.1% and 50%. Although their guesstimates are very far apart, neither need be incompetent: the components of the Drake Equation are very fuzzy.

One of the terms in that Equation is the portion of stars which meet certain criteria. Suppose that astronomers were confident of that number but suddenly discover, via the study of Webb telescope data, that those stars are ten times more common that they had thought. One detective might respond by increasing his guesstimate to 1.0%. Can we guess what change, if any, the other detective might make to his guesstimate?

I continue to realize that we still have insufficient information for a reliable guess at that other detective’s new opinion. A toy problem less fuzzy than the Drake problem would mitigate this somewhat, but only a bit.

So: Is the best heuristic answer still “Unknown and unknowable”?

LSLGuy · October 27, 2024, 11:13am

TL;DR: Yes.

Long form answer:

Why? Because you have zero insight into each detective’s weighting function of all their evidence before your novel J evidence is added to their personal collection of evidence to consider.

In more formulaic terms Detective 1 has an evidence weighting function with 23 inputs that each have a coefficient and collectively give result e.g. 20%. But that function is a black box. To us, and (realistically) to the detective themselves.

Meantime Detective 2 has their own function. It has 18 inputs, only 12 of which are in common with Detective 1s. And of those several common inputs D1 & D2 often assign radically weights or even weights of different sign. Of course becase D2’s function is just as black-boxed as D1’s, you and I and they can never know any those details I just godded into my narrative.

Now, let’s throw a new bit of data J into the each function. Each detective can assign that any weight between positive [huge number] and negative [huge number]. Any at all. Even zero.

And now you want a heuristic to inform your estimate, given D1’s new output value, of what D2 will chose as their new J weight and therefore their new output value? Nonsense; ain’t gonna happen.

At least not in any mathematical sense. If you want to switch to rhetoric and persuasion, you might get somewhere. At the expense of jettisoning 100% of the math and substituting handwaving about human nature and human thought processes. Warts and all.

Ultimately, confusing confidence in the correctness of an assessment of an uncertain situation with likelihood of that assessment actually being correct makes a nonsense of the entire enquiry. If fewer words, confidence != correctness.

Those are two nearly unrelated ideas and to make any logical sense of the situation you must keep those two ideas as separate as water and electricity. Work in one or the other, but never both. And especially never both inadvertantly!

Thudlow_Boink · October 27, 2024, 2:53pm

Bayes’ Theorem and Bayesian statistics are not the same thing, although Bayesian statistics does depend on use of Bayes’ Theorem (though Bayes’ Theorem is also applicable to and used in classical probability).

Bayesian statistics is specifically about “using probability as a stand in for their level of certainty,” and using new evidence to mathematically update that level of certainty. To claim that this isn’t “real” probability, or that “to compute a real probability you need to know all possible outcomes,” puts you distinctly on the “classical” or “frequentist” side of the Bayesian vs frequentist debate.

Bayesian statistics is a particular approach to applying probability to statistical problems. It provides us with mathematical tools to update our beliefs about random events in light of seeing new data or evidence about those events.

In particular Bayesian inference interprets probability as a measure of believability or confidence that an individual may possess about the occurance of a particular event.

We may have a prior belief about an event, but our beliefs are likely to change when new evidence is brought to light. Bayesian statistics gives us a solid mathematical means of incorporating our prior beliefs, and evidence, to produce new posterior beliefs.

Bayesian statistics provides us with mathematical tools to rationally update our subjective beliefs in light of new data or evidence.

This is in contrast to another form of statistical inference, known as classical or frequentist statistics, which assumes that probabilities are the frequency of particular random events occuring in a long run of repeated trials.

For example, as we roll a fair (i.e. unweighted) six-sided die repeatedly, we would see that each number on the die tends to come up 1/6 of the time.

Frequentist statistics assumes that probabilities are the long-run frequency of random events in repeated trials.

When carrying out statistical inference, that is, inferring statistical information from probabilistic systems, the two approaches - frequentist and Bayesian - have very different philosophies.

Frequentist statistics tries to eliminate uncertainty by providing estimates. Bayesian statistics tries to preserve and refine uncertainty by adjusting individual beliefs in light of new evidence.

(Source)

I am certainly not an expert in Bayesian statistics. I don’t think I understand either the OP’s question or Bayesian statistics well enough to tell whether or how it applies to that question. But it certainly seems like it could be relevant to a situation where “different detectives estimated different probabilities, and that both are suddenly presented with strong new evidence.”

Voyager · October 27, 2024, 5:46pm

I may be wrong, but it sounds to me that you’re trying to combine precise mathematics with guesses. I agree that the detectives will change their assigned probabilities with new information, but I have no idea of how to quantify that precisely. Bayes kind of comes in because the new information reduces some of the uncertainty and thus builds on the already decided probability without the detective having to start from scratch.
When I was in college I took Theory of Knowledge, pretty much ruined by a couple of hard solipists who wouldn’t shut up. I did my paper based on Lord Keynes’ first book, which “solved” this problem by saying that although we’ll never prove we’re not brains in vats, we can calculate the probability we aren’t (in the same way you’re talking about) and this probability approaches one and gets closer to it the more we experience. I later found out he didn’t invent this concept. But Keynes’ math was fuzzy also.

Voyager · October 27, 2024, 5:52pm

Keynes used it also. I can’t tell from the Wiki article on Bayesian statistics when it was formally developed, but Keynes wrote a bit over a century ago and was probably before that.
The key here is “mathematically.” I agree it is mathematical in that it gets assigned a value, but maybe not in the sense that you can do reasonable computations of it.
But I do agree the concept is useful.

Pasta · October 28, 2024, 7:57pm

This is way more tractable than the above discussion implies. Not knowing how the detectives arrived at their initial probabilities isn’t a real problem here.

Let’s assume that Detective #1 assigns an initial (“prior”) probability of p_1 that X is true. Detective #2 assigns a prior probability of p_2. How they got to these numbers doesn’t matter as long as their prior evidence is adequately independent of the new evidence that will appear. This isn’t a crazy assumption. They may have different prior knowledge of the case, or different initial unconscious biases, or whatever. But all we will need below is that the new evidence is something concrete that is understood in the same way by the two detectives (formalized in a moment).

Okay, so new evidence J is found.

To update a detective’s beliefs, we need two quantities:

(1) What is the likelihood of observing evidence J if X is true: P(J|X).
(2) What is the likelihood of observing evidence J if X is false: P(J|\neg X).

I’ve used the symbol “\neg” to indicate negation.

Example: Is Jim the burglar (X)? The shoeprints in the dirt outside match Jim’s shoe type (J). How likely is it to find his type of shoeprint outside if he is the burglar, P(J|X)? How likely is it to find his type of shoeprint if he is not the burglar, P(J|\neg X)?

Let’s assume the two detectives agree on these two likelihoods. Importantly, this can easily be true despite the differences in their prior probabilities. After all, the probability that a generic burglar leaves a shoeprint can be estimated by forensic experience, and the probability that a generic shoeprint is of a specific type is a matter of shoe sales in the area, etc. These are all things we will assume the two detectives agree on despite any differences in their prior knowledge (or the weighting thereof) on any other aspects of the case.

Detective #1 finds a new probability p_1'. An obvious approach is to assume his updating follows Bayes’ rule, which means:

p_1'=\frac{P(J|X)p_1}{P(J|X)p_1+P(J|\neg X)(1-p_1)} .

We are told p_1 and p_1'. We have two unknown, independent likelihoods P(J|X) and P(J|\neg X) that we would like to infer from it. Instead, let’s just algebraically solve for one of these in terms of the other.

Now we turn to Detective #2. He has the same update rule, just now with p_2' on the left and p_2 everywhere on the right. We will assume, as discussed, that he understands the new evidence in the same way. That is, he has the same likelihoods P(J|X) and P(J|\neg X). And, we have a linear relation between these, so we can replace (say) P(J|\neg X) with some coefficient times P(J|X) in Detective #2’s updating rule. When we do that, the P(J|X) factors in the numerator and the denominator cancel.

I haven’t written out this algebra here, but it’s rather straightforward if you know algebra, and not interesting if you don’t. In the end, I get:

\frac{1}{p_2'}=1+\left(\frac{1-p_2}{p_2}\right)\left(\frac{p_1}{1-p_1}\right)\left(\frac{1-p_1'}{p_1'}\right)

This immediately reflects your intuition that odds may appear. Labeling odds with symbol O and using the same symbol annotations, the above can be rewritten tidily:

O_2'=\frac{O_2 O_1'}{O_1}

For a specific example (yours in fact), say Detective #1 starts with p_1=0.5 and ends with p_1'=0.8, and say Detective #2 starts with p_2=0.2. In terms off odds, this is O_1=1, O_1'=4, and O_2=0.25. This means O_2'=1, or a 50%/50% estimate.

I think this is exactly the update rule you are looking for.

Pasta · October 28, 2024, 8:00pm

And rereading your OP, I see this is exactly the update rule you were proposing. So, yes, your odds ratio approach works given the assumptions I’ve outlined.

Gordon_G · October 31, 2024, 11:48am

Thank you, Pasta. All we need to assume is that J’s effect is independent of whatever the difference is between the two Detectives’ estimates.

I actually figured this out after a 3rd cup of coffee yesterday and clicked to SDMB to post it myself. And I’d intuited this a few years ago – Perhaps I’d have remembered the proof had I worked it laboriously then rather than just having a flash of intuition.

Of all the things I’ve lost, I miss my mind the most.

Pasta · October 31, 2024, 5:45pm

Yes, and also that their estimates of the probability of J is the same in both the X and \neg X cases, which is often a separate requirement from just independence from their prior probilities.

For pedagogical completeness, we can extend the example below to break the assumptions.

Let’s say that Detective #2 found an invoice for shoe repair showing that Jim’s evidentiary shoes were actually at the cobbler on the week of the crime. Detective #2 would assign a lower value to P(J|X) than Detective #1. That is, Detective #2 would say it is less likely to see this shoeprint at the crime scene if Jim were the burglar. Jim would need to have a second pair of the same shoes or something. Or maybe this is a clever ruse and he planned the shoe repair timing, making it more likely! Either way, the detectives would disagree on P(J|X) but not because of any connection to the prior probabilities.

It would not be hard to add story elements that would lead to differences in P(J|\neg X) as well, without connection to the prior probabilities.

For connections to the prior differences, we can change the additional evidence that Detective #2 has in hand. Detective #2 had gotten ahold of a receipt that shows Jim completing his appointment for shoe repair on the very same evening as the crime. Even before any shoeprints are found, Detective #2 sees that the time conflict makes it very hard (but not impossible) for Jim to get to the crime scene in time. This is why the detectives disagree on their prior probability estimates. When the new shoe print evidence J is found, Detective #2 notes that this either makes the time conflict even stronger given the time needed to change shoes or it makes the shoeprints less likely to be Jim’s.

Topic		Replies	Views
Expected bias on unfair coin Factual Questions	14	2230	October 3, 2014
Lets discuss Bayesian Statistics In My Humble Opinion	92	12072	April 20, 2012
Frequentism vs Bayesianism--what's the practical distinction? Factual Questions	14	1764	December 12, 2016
How is the battle between Bayesians and Frequentists going? And is it a war or a friendly squabble? Factual Questions	19	3180	December 23, 2013
A statistics question Factual Questions	14	1343	April 11, 2009

Question about estimating probabilities

Related topics