Can anyone explain Bayes Rule to me?

I was reading this thread More Probability Questions as I’m studying statistics too, and since you guys did such a good job making that stuff clearer I was wondering if anyone could tackle Bayes Rule.

I heard it in class, and I have a textbook, but I just do not get it. What’s the rational behind it? How does it work? Why on earth would I want to do it? Etc.

Thanks!

So in that thread, in my first post, I outlined what I called rules 0 through 4 of probability. Number 4 essentially defines conditional probability: it says P(A & B) = P(A) * P(B | A), where “P(B | A)” means “the probability of B given that A is true”.

So what? Well, A & B’s the same as B & A. So we could just as well decompose P(A & B) the other way, into P(B) * P(A | B), where the second term means “the probability of A given that B is true”.

So now what? Well, now we know that P(A) * P(B | A) = P(B) * P(A | B). Move terms around, and you get that P(A | B) = P(B|A) * P(A) / P(B). This is Bayes’ Rule/Theorem/Law. It tells us how to calculate the probability of A given that B is true, from the probability of B given that A is true, the probability of A, and the probability of B. It’s often useful because in many situations you want to know a conditional probability, but it’s the converse probability that is actually easy to immediately calculate. Using Bayes’ Rule, you often have a nice way to move between the two.

First, here’s a good introductory page on Bayes’s formula and its applications: An Intuitive Explanation of Bayesian Reasoning. It has lots of interactive examples.

The most basic form of Bayes’s formula is

p(A | B) = p(B | A) * p(A) / p(B).

To see what this means, and why it’s true, imagine that you have a dart board, with the area of the dart board equal to exactly 1 area-unit. Suppose that you are going to throw a dart at the board. Let’s assume that you aren’t going to aim the dart at any particular part of the board, but the dart is guaranteed to land somewhere on the board. That is, it’s definitely not going to land outside the board, but it is no more likely to land on any one part of the board than on any other.

It follows from these assumptions that, if you shade in a region A on the board, the probability that your dart will land in A is exactly the area of A.

Some notation: We denote the area of a region A by p(A). We use the letter “p”, of course, because, as mentioned, this is the probability that the dart will land in A. Also, suppose we draw two regions on the board, A and B. Then we denote the region of their overlap by A & B. Hence, p(A & B) is just the probability that the dart will land both in A and in B – that is, in their overlap.

Now consider the following situation: For whatever reason, we are only interested in those cases where the dart lands in region B. Furthermore, we want to know, given a dart-throw landing in B, what is the probability that it also lands in A? We denote this value by p(A | B).

Observe the difference between p(A | B) and p(A & B). The value p(A & B) is the proportion of dart throws that land in A & B out of all throws, including those throws that miss B altogether. The value p(A | B), on the other hand, is just the proportion out of those throws that at least hit B.

So, to re-capitulate, p(A | B) is the probability that the dart hits A, given that it hits B. But we saw above that we can think of probabilities as the areas of regions on a dart board of area 1. So, to compute p(A | B), we can forget everything outside of B and just think of B itself as an entire dart board (though perhaps an oddly-shaped one). Then A & B is a region inside this new board.

This means that p(A | B) is the area of A & B after re-scaling so that B has area 1. That is, we are now computing areas in B by taking their “original” areas and dividing through by the area of B (which, you should note, makes the re-scaled area of B itself equal to 1, as desired.)

As an equation, the content of that last paragraph is

p(A | B) = p(A & B) / p(B) … (1).

But, we could just as well have carried out this reasoning with A and B switched. We would then have found that

p(B | A) = p(B & A) / p(A).

Of course, the area of the overlap of A and B is the same whether you put A or B first. That is, p(B & A) = p(A & B). So this last equation becomes

p(B | A) = p(A & B) / p(A) … (2).

Now we have p(A & B) in two equations. Solving for it in (2), and then plugging the result into (1), gives us

p(A | B) = p(A & B) * p(A) / p(B).

And this is Bayes’s formula.

Oops. Screwed up the most important part. That last equation should be

p(A | B) = p(B | A) * p(A) / p(B).

Go to IMDB and get a practical education. They say:

*The formula for calculating the Top Rated Titles gives a true Bayesian estimate:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

where:
R = average for the movie (mean) = (Rating)
v = number of votes for the movie = (votes)
m = minimum votes required to be listed in the Top (currently )
C = the mean vote across the whole report (currently )
*

look at a few different movies - ones with lots of votes, ones with few votes and you will understand it.

The intuitive idea behind Bayes’ rule is that it gives you a rule for updating your beliefs as new evidence becomes available. Wikipedia has a good write-up; in particular, pay attention to section 1.

Ah, this is cool guys. I’m telling you, some of you should be writing textbooks instead of whoever they have doing it now! I will definitely be looking over those websites, I tried googling myself but got a lot of obviously not related stuff.

Yea, pretty good descriptions… it’s pretty annoying to find good info on this stuff.
even when it gets complex with like kalman, i still think there are intuitive ways to explain it.

if it doesn’t still make sense, i found some good vids on it on stuff like youtube.
there is this one from china and this other one that has a lot of stuff on recursive bayes
they seem decent. i dunno… they helped me.