Question About Bayesian Analysis

psychloan · August 12, 2006, 2:46pm

This was actually inspired by a question that came up in a ‘pit’ thread. I’ve changed the question that came up so as to make it less inflamatory. I believe my analysis is correct, but I would like somebody who is comfortable with Bayesian analysis to tell me if I’m off base:

There is a population of 2,000,000 people made up of 1,000,000 fleepers and 1,000,000 meepers.

There is an offense among these people known as tickling. However, fleepers only tickle meepers; and meepers only tickle fleepers.

Every time a person is tickled, he reports it to the Tickling Authority who investigates the claim.

Unfortunately, there are also a large number of FTR’s: False tickling reports. It is known that 25% of all tickling reports are false.

Every year, the Tickling Authority receives approximately 13,333 tickling reports.

Two other facts of note: For reasons that are not entirely clear, there is a tremendous imbalance in the incidence of tickling in the two populations:

99% of tickling incidents involve fleepers tickling meepers. Only 1% go the other way. (These are tickling incidents that are investigated and confirmed by the tickling authority.)

Lastly, members of both groups are equally likely to lie. Thus, a fleeper that has not been tickled is just as likely to file an FTR (false tickling report) as a meeper who has not been tickled.

The question is as follows: If a fleeper lodges a tickling report, what is the likelihood that it is false? Same question for a meeper. And are the two probabilities different?

kellner · August 12, 2006, 4:04pm

I hope I didn’t make too many mistakes, but here is my take:

We have 13333 reports. 25% of those are false, that leaves about 10000 true incidents, assuming that all incidents are reported.

Among those are
9900 fleeper-on-meeper and
100 meeper-on-fleeper incidents.

There are 3333 false reports and 1990000 people who haven’t been tickled. So the probability of someone who hasn’t been tickled reporting incorrectly is about 0.0017.

Therefore the
990100 meepers who haven’t been tickled file about 1658 false reports and the
999900 fleepers who haven’t been tickled file about 1675 false reports.

Meepers file 9900 true and 1658 false reports (=11558): about 14% are false.
Fleepers file 100 true and 1675 false reports (=1775): about 94% are false.

(please excuse my totally unsystematic rounding)

Gorsnak · August 12, 2006, 4:18pm

That sounds about right to me, but it assumes that:

no one is tickled more than once
no one who was actually tickled filed an additional false report
false tickling reports arise completely ex nihilo, as opposed to stemming from quasi-tickling-like incidents followed by misunderstandings, etc.

1 and/or 2 being false would have a modest impact resulting in meeper reports being more likely to be false and fleeper reports being less likely, but I’m too lazy to do the math. 3 being false could have a huge impact, depending on the specifics, to the point of making the entire question unanswerable.

This is about that Bricker pit thread I’ve been avoiding reading, isn’t it?

psychloan · August 12, 2006, 6:01pm

Lol. Yes, unfortunately I seem to be the only person posting in the thread at the moment who understands Bayesian reasoning.

psychloan · August 12, 2006, 6:03pm

kellner:

I hope I didn’t make too many mistakes, but here is my take:

We have 13333 reports. 25% of those are false, that leaves about 10000 true incidents, assuming that all incidents are reported.

Among those are
9900 fleeper-on-meeper and
100 meeper-on-fleeper incidents.

There are 3333 false reports and 1990000 people who haven’t been tickled. So the probability of someone who hasn’t been tickled reporting incorrectly is about 0.0017.

Therefore the
990100 meepers who haven’t been tickled file about 1658 false reports and the
999900 fleepers who haven’t been tickled file about 1675 false reports.

Meepers file 9900 true and 1658 false reports (=11558): about 14% are false.
Fleepers file 100 true and 1675 false reports (=1775): about 94% are false.

(please excuse my totally unsystematic rounding)

Thank you. I came up with about the same numbers. I’m not crazy. Lol.

Yeah · August 12, 2006, 6:54pm

I get 0.1376835 and 0.9460227 letting OpenOffice Calc do the math (and rounding).

Jake · August 12, 2006, 7:02pm

We need a smiley depicting eyes glazing over.
Thank you and goodnight.

Noel_Prosequi · August 13, 2006, 1:45am

Thanks for starting a thread that takes the heat out of that Pit thread. Chernobyl ain’t in it, down there. I’m glad of the opportunity to express a view without having to be aligned with the vitriol.

For mine, the part of your problem quoted above is (a) difficulty. Let us assume that Meepers and Fleepers have an exactly equivalent genetic and cultural predisposition to lying, as a general proposition. Lying, however, is context dependent. In any individual case and on any particular topic, whether a person lies or not is powerfully affected by the extent to which they expect to be believed. Thus, whereas the lying rate of Meepers and Fleepers will be exactly equal on some neutral subject as what they ate for breakfast, it necessarily changes where the subject of the lie is an area where there is already an asymmetry between Meepers and Fleepers, because of the factor of expectation of belief.

Partly for these reasons, I doubt that probabalistic analyses, Bayesian or otherwise, are of any use in determing the truth or otherwise of past events. The allegation is true or false. There was tickling or there was not. (We needn’t be troubled by considerations at the margin such as misunderstandings, etc, for present purposes). Schrodinger’s cat does not apply at the macro level.

Future events,maybe (what is my chance, as a Meeper, of being tickled if I go out tonight?) And of course probability is useful in describing large numbers of cases (for purposes of resource allocation and the like).

But not past events. What 100 or 1000 or 1000000 other people in similar circumstances may have done has no bearing on whether a particular Fleeper or Meeper was a tickler/ticklee. The facts of past events have crystallised. The problem that we may not know reliably what those facts are is not one that can meaningfully be answered by probability analysis of different cases.

Different scenario. The vast majority of the people the police arrest are guilty. We know this from pleas of guilty, verdicts at trial, etc. When you add them up, in most jurisdictions, about 90%+ of people charged are convicted in most jurisdictions. Does that help us tell whether a particular suspect before a particular jury is guilty or not? Of course not. He is either guilty or not according to the facts of his case. If we don’t have those facts, then supposed probability analysis adds nothing. It provides an illusion of significance, but is in reality an error of the order of “looking for the sixpence under the lamppost”.

psychloan · August 13, 2006, 3:15am

Noel Prosequi:

Thanks for starting a thread that takes the heat out of that Pit thread. Chernobyl ain’t in it, down there. I’m glad of the opportunity to express a view without having to be aligned with the vitriol.

For mine, the part of your problem quoted above is (a) difficulty. Let us assume that Meepers and Fleepers have an exactly equivalent genetic and cultural predisposition to lying, as a general proposition. Lying, however, is context dependent. In any individual case and on any particular topic, whether a person lies or not is powerfully affected by the extent to which they expect to be believed. Thus, whereas the lying rate of Meepers and Fleepers will be exactly equal on some neutral subject as what they ate for breakfast, it necessarily changes where the subject of the lie is an area where there is already an asymmetry between Meepers and Fleepers, because of the factor of expectation of belief.
**

That may be so, but it’s off-topic. I asked a specific question and I’m looking for specific answers.

One can debate whether the example of Fleepers and Meepers is a bad model for reality, but the main point is that (1) background statistics matter; and (2) it is not inherently contradictory two have two sub-populations that are equally honest while at the same time, a particular claim made by a member of one of the populations is less likely to be true than the same claim made by a member of the other population.

Yeah · August 13, 2006, 3:24am

Thanks psychloan. These Bayesian analyses are always a great way to distinguish those who grasp logical reasoning from those who just can’t quite.

Noel_Prosequi · August 13, 2006, 4:27am

Ah. My apologies. I thought the OP’s question was asked with a broader context in mind. And of course, it assumes the validity of application of the Bayesian approach in such circumstances. The philosophy underlying Bayesian analyses is not simply a matter of “logic”, nor is it uncontentious. But since that’s not the debate you were after, my bad (slinks into corner).

cerberus · August 13, 2006, 6:13am

I think that we need to carefully distinguish our Bayes Bits:

The Bayesian perspective in probability, which views probabilities in terms of subjective belief, rather than as relative frequencies.
The use of Bayes Theorem, which makes use of conditional and prior probabilities to compute a contingent probability.

Noel_Prosequi · August 13, 2006, 6:43am

I don’t think we’re singing from different song-sheets: see (for want of a more formal cite)

Bayes - Wikipedia’_theorem

But the OP gets to decide what’s on-topic and pscyhloan has ruled the discussion out of bounds. So.

cerberus · August 13, 2006, 7:08am

If I’m remembering Bayes’ Theorem correctly, it uses the definitions of conditional and joint probabilities:

Pr{A | B} = Pr{ A and B }/ Pr{ B}; Pr{B | A} = Pr{ A and B }/ Pr{ A}; so then
Pr{A and B} = Pr{A|B}*Pr{B} = Pr{B|A}*Pr{A}…

So if we have a probability space partitioned into a disjoint union S=Union {A(i): i in I}, and an event B, then we can say that

Pr{A(j)|B} = Pr{ A(j) and B} / Pr{B} = Pr{ A(j) and B } / [Sum over i in I Pr{A(i)andB}] = Pr{A(j)andB}/[Sum over i in I of Pr{B|A(i)}*Pr{A(i)] =
[Pr{B|A(j)}*Pr{A(j)}]/[Sum over i in I of Pr{B|A(i)}*Pr{A(i)]

Or briefly,

Pr{A(j)|B} = [Pr{B|A(j)}*Pr{A(j)}]/[Sum over i in I of Pr{B|A(i)}*Pr{A(i)].

Noel_Prosequi · August 13, 2006, 9:52am

Forgive me. I am not sure what point you are advancing here. I don’t doubt that Bayesians can derive by manipulation of mathematical objects things which they call “probabilities” (and it may be that the language here is the cause of, and solution to, all of life’s problems ). I don’t doubt that much economic theory is built on Bayesian models. My point is one of legitimacy of application. Cribbing from Wiki again,

This doesn’t mean that the frequentists are wrong. It may be that limitations which constrain them are inherent, or that there is a subtle difference in the meaning of the word “probability” which each is using.

And it may be that the point of difference I am perceiving is cultural. I am a lawyer. Use of probabilistic language to apply to past events in the way described in the OP is (to me) the vice of “arguing from the general to the particular”. It is (again, to my mind) the same vice as arguing that any individual criminal is “probably” guilty a priori , because most people who are charged are guilty. Economists may have a different take, but I remain to be convinced that the maths adds any more than it would to a debate between an Einsteinian and a Newtonian, in which the Newtonian said, “But look! My equations are internally logically consistent! They give answers, according to the terms defined therein! They must therefore be true!”

For what it’s worth, I remember some years ago having professional dealings with heavy-hitter statisticians, and being told that most of them weren’t Bayesians. Times may have changed, but I gathered that the reasoning was something like that which I have tried to articulate above.

psychloan · August 13, 2006, 1:51pm

And that’s an interesting and important question. Which I believe belongs in Great Debates!!

ultrafilter · August 13, 2006, 1:55pm

Addressing the limitations of a model is a necessary part of statistical analysis, so no, that sort of discussion isn’t off-topic.

The important point that I haven’t seen addressed here is that you’re never assessing whether a meeper tickled a fleeper; you’re assessing whether this meeper tickled this fleeper. There will be other information present that outweighs any probabilistic analysis.

holmes · August 13, 2006, 2:17pm

Further psychloan is using this thread to buttress up his claims in the thread that spawned this. If the conclusion he’s drawn is flawed because this isn’t the method to use to determine things like honesty, lying and credibility, then that limitation should be acknowledged… whether or not the math is correct.

It would IMO, be dishonest not to.

psychloan · August 13, 2006, 2:26pm

Hey, if any moderators happen to be reading this, could this discussion be moved to great debates? I believe that my original factual question has been answered, but other issues have arisen that are worth debating.

you_with_the_face · August 13, 2006, 2:40pm

The likelihood that Fleeper Smith’s claim is false is no different than Meeper Brown’s. If they are both equally honest, both are equally likely lie or tell the truth.

Your conclusion should read thusly: in a representative sample of hard-copy Fleeper reports, what are the chances that a randomly selected one will be false. Is that probability larger or smaller than if that experiment is done with Meeper reports?

You are trying to apply probability backwards.

And there’s no reason to rule the points that Noel raised off-sides. If your assumptions are invalid then so will be your answer, since Bayesian reasoning is more than just making numbers add up to the right thing. If you only see this question in terms of math, if will be difficult for you to word your conclusion in a way that is logical. Hence your problems in the Pit thread.

Topic		Replies	Views
Lets discuss Bayesian Statistics In My Humble Opinion	92	12088	April 20, 2012
Bricker is a disingenous punk. The BBQ Pit	613	20384	August 25, 2006
Should statistics be used to evaluate the veracity of an alleged victims claims? Great Debates	46	2139	April 19, 2006
Bayes Theorem and The Resurrection Great Debates	39	3289	May 29, 2002
Question about estimating probabilities In My Humble Opinion science-math	18	262	October 31, 2024

Question About Bayesian Analysis

Related topics