Carl Sagan’s claim that ‘extraordinary claims require extraordinary evidence’ is often repeated in the face of phenomena that seem to defy prior understanding. A good example is the recent (apparent) discovery of faster-than-light neutrinos: even though the experimental finding is robust with respect to usual standards, the claim is widely met with skepticism – and, I want to argue, rightly so.

There’s a rigorous notion behind Sagan’s aphorism, and it’s to do with what’s called Bayesian inference. The gist of it is that you enter any given situation with an expectation of what you’ll encounter, given by an assignment of probabilities to certain possible events. This probability distribution is called the **prior probability**, or just **prior** for short.

Depending on what actually occurs, you get new information about the situation you’re in; this information changes your expectations, and thus, leads to you assigning a new probability to certain events that may occur in this situation, the new probability distribution being known as **posterior probability** or **posterior**. Basically, you get more familiar with the situation, and your knowledge of what to expect becomes more accurate. This process is known as **Bayesian updating**.

This notion is particularly useful for *hypothesis testing*: given a certain hypothesis, how likely is it that the observed data is consistent with it? In other words, how likely is the hypothesis true?

To get a grasp of this, it’s best to see it at work. Let’s say there are two identical cookie jars, A and B. Both are filled with 20 cookies. In jar A, 10 cookies have chocolate chips in them, while the other 10 are plain; in jar B, only two cookies have chocolate chips, 18 being of the plain variety. You reach in, and pull out a chocolate chip cookie. The question is now: what is the probability you should attach to the hypothesis of having jar A?

Let’s start with figuring out the prior. Both jars are indistinguishable, so the probability of selecting either must be equal; since the total probability must be one (you definitely select one of the two jars), it follows that you assign a probability of 50% to having either jar.

The probability of the data arising – i.e. of selecting a chocolate chip cookie – is given by the sum of the data arising in the case you have jar A times the probability of having jar A and the data arising in the case you have jar B times the probability of having jar B, i.e. 0.5 * 0.5 (the probability of getting a chocolate chip cookie if you have jar A is 50%, and the probability of having jar A is 50%) + 0.1*0.5 (the probability of getting a chocolate chip cookie if you have jar B is 10%, and the probability of having jar B is 50%) = 0.3.

Now, Bayes’ theorem tells us that the (posterior) probability of the hypothesis being true – i.e. of you having selected jar A – is equal to the probability that the data arises if the hypothesis is true – i.e. that you select a chocolate chip cookie if you have jar A – times the prior probability, divided by the probability of the data arising. In symbolic form: P(having jar A if drawn chocolate cookie) = P(drawing chocolate cookie if having jar A) * P(having jar A)/P(drawing chocolate cookie).

This means that after having drawn a chocolate cookie, you should assign to the hypothesis of having cookie jar A a probability of 0.5*0.5/0.3 = 0.83; you have become much more confident that you indeed have jar A. This is as expected: jar A contains many more chocolate cookies than jar B, so the data of drawing a chocolate cookie supports the hypothesis of having jar A.

But the crucial point in this analysis is that your judgement does not depend exclusively on the data you receive, but also on the prior probability you assign to the hypothesis. So let’s examine the same experiment, but with the difference that your mother has given you the cookie jar, stating that it’s jar B.

Say you trust your mother (as you should), but everyone is fallible, so perhaps she has confused the jars; so you assign to the hypothesis of having jar B a probability of 90%, and consequently, to the hypothesis of having jar A a probability of 10%. Again, you draw a chocolate chip cookie out of the jar – i.e. you make exactly the same observation as in the previous case. What’s now the probability you should assign to the hypothesis that you have jar A?

Well, things proceed just the same way: the probability of drawing a chocolate cookie is now 0.5*0.1 (the probability of drawing a chocolate cookie out of jar A, times the probability you now assign to having jar A) + 0.1*0.9 (the probability of drawing a chocolate cookie out of jar B, times the probability of possessing jar B) = 0.14. So, the probability you assign to the hypothesis of having jar A after drawing a chocolate cookie is now: 0.5*0.1/0.14 = 0.36! You do now assign a substantially higher probability to that hypothesis – but since you were very confident in its being wrong beforehand, the data does not suffice to make you change your opinion; you’re still pretty certain that you have jar B in your possession (after all, your mother told you so). For a quick check, let’s look at the probability you assign to possessing jar B: 0.1*0.9/0.14 = 0.64; you’re still more certain that you possess jar B, even though the data would be more in accordance with the hypothesis that you possessed jar A, in fact.

There are, I believe, four conclusion to draw from this: the first is that indeed, extraordinary claims require extraordinary evidence – no experiment occurs in isolation; there are always reasons for expecting certain outcomes, and those reasons can be very good. So, in order to overthrow a well-confirmed theory, it does require a substantial weight of evidence; more, in particular, than it takes to re-confirm the theory, or to decide between two theories that are so far equally well supported. There is thus no double standard involved in demanding a higher level of scrutiny when it comes to highly unexpected data (as those that accuse scientists of ‘ignoring’ certain data sometimes are wont to claim); in fact, it’s the rational thing to do.

The second, equally important one is that two people can be confronted with the same data, can act equally rationally, and nevertheless come to different judgments. This is true in particular in debates: arguments you find entirely convincing won’t necessarily convince your opponent, and this does not mean that he is obstinate, refuses to accept your logic, or is just unwilling to give in – he may be acting just as rationally as you are, just coming at it from a different point of view, so to speak.

The third is that you must never be too certain in your judgments. For, from the considerations above, it becomes immediately clear that you can never be convinced of anything you assign a probability of 0 to, even if it is true. This is in a nutshell what lurks behind things like conspiracy theories or other cases where people seem just incorrigible: once they have brought their minds to some conclusion with absolute certainty, no amount of contrary evidence will ever suffice for them to change it. This is known as Cromwell’s rule, in reference to Cromwell’s appellation to the Church of Scotland:

The fourth, and perhaps most surprising, conclusion I would like to draw is that *no two persons, if they both act equally rationally, can ever be in total agreement*. The reason for this is simple: two persons, if they are, in fact, different persons, will bring different experiences, different amounts of knowledge, different judgments etc. into any given situation. This, however, means that in general, they will assign different priors – which, as we have seen, leads to different judgments even in the face of identical data. This difference is persistent: for, while you of course can iterate the updating process (using a previously generated posterior as a new prior), and this iteration converges in the limit to an assignment of the true probabilities to any given event, in finitely many steps, two people, starting out with two different priors, will not arrive at the same posteriors.

This, I think, is a somewhat underappreciated snag in any ‘search for truth’, be it scientific or philosophical. It’s enough to have different persons – for only identical persons will have identical priors, those having been generated by identical experiences – to have persistent disagreements for all finite times. So when next somebody just refuses to see your point, or to be convinced by your arguments, perhaps consider that it doesn’t (necessarily) mean they are stupid, obstinate, or delusional – they’re just not you. (Of course, they still may be stupid, obstinate, and delusional anyway…)

This has gotten a bit longer than anticipated; thanks for anybody who persisted to this point.

What I’d like to debate are questions like the following: How should we best act, given that we may never agree on certain things? Is all debate pointless? What do we do in the face of incorrigible people? As they can’t be convinced, it would seem futile to engage them in debate. But if they receive no opposition, their point of view may proliferate unhindered. Should we let them persist, or oppose their viewpoints so as not to give them undue weight? And how does one best agree to disagree – i.e. acknowledge that what suffices to convince me, may not convince you, without this necessarily meaning that either of us is wrong?