Question About Bayesian Analysis

This was actually inspired by a question that came up in a ‘pit’ thread. I’ve changed the question that came up so as to make it less inflamatory. I believe my analysis is correct, but I would like somebody who is comfortable with Bayesian analysis to tell me if I’m off base:

There is a population of 2,000,000 people made up of 1,000,000 fleepers and 1,000,000 meepers.

There is an offense among these people known as tickling. However, fleepers only tickle meepers; and meepers only tickle fleepers.

Every time a person is tickled, he reports it to the Tickling Authority who investigates the claim.

Unfortunately, there are also a large number of FTR’s: False tickling reports. It is known that 25% of all tickling reports are false.

Every year, the Tickling Authority receives approximately 13,333 tickling reports.

Two other facts of note: For reasons that are not entirely clear, there is a tremendous imbalance in the incidence of tickling in the two populations:

99% of tickling incidents involve fleepers tickling meepers. Only 1% go the other way. (These are tickling incidents that are investigated and confirmed by the tickling authority.)

Lastly, members of both groups are equally likely to lie. Thus, a fleeper that has not been tickled is just as likely to file an FTR (false tickling report) as a meeper who has not been tickled.

The question is as follows: If a fleeper lodges a tickling report, what is the likelihood that it is false? Same question for a meeper. And are the two probabilities different?

I hope I didn’t make too many mistakes, but here is my take:

We have 13333 reports. 25% of those are false, that leaves about 10000 true incidents, assuming that all incidents are reported.

Among those are
9900 fleeper-on-meeper and
100 meeper-on-fleeper incidents.

There are 3333 false reports and 1990000 people who haven’t been tickled. So the probability of someone who hasn’t been tickled reporting incorrectly is about 0.0017.

Therefore the
990100 meepers who haven’t been tickled file about 1658 false reports and the
999900 fleepers who haven’t been tickled file about 1675 false reports.

Meepers file 9900 true and 1658 false reports (=11558): about 14% are false.
Fleepers file 100 true and 1675 false reports (=1775): about 94% are false.

(please excuse my totally unsystematic rounding)

That sounds about right to me, but it assumes that:

  1. no one is tickled more than once
  2. no one who was actually tickled filed an additional false report
  3. false tickling reports arise completely ex nihilo, as opposed to stemming from quasi-tickling-like incidents followed by misunderstandings, etc.

1 and/or 2 being false would have a modest impact resulting in meeper reports being more likely to be false and fleeper reports being less likely, but I’m too lazy to do the math. 3 being false could have a huge impact, depending on the specifics, to the point of making the entire question unanswerable.

This is about that Bricker pit thread I’ve been avoiding reading, isn’t it?

Lol. Yes, unfortunately I seem to be the only person posting in the thread at the moment who understands Bayesian reasoning.

Thank you. I came up with about the same numbers. I’m not crazy. Lol.

I get 0.1376835 and 0.9460227 letting OpenOffice Calc do the math (and rounding).

We need a smiley depicting eyes glazing over.
Thank you and goodnight.

Thanks for starting a thread that takes the heat out of that Pit thread. Chernobyl ain’t in it, down there. I’m glad of the opportunity to express a view without having to be aligned with the vitriol.

For mine, the part of your problem quoted above is (a) difficulty. Let us assume that Meepers and Fleepers have an exactly equivalent genetic and cultural predisposition to lying, as a general proposition. Lying, however, is context dependent. In any individual case and on any particular topic, whether a person lies or not is powerfully affected by the extent to which they expect to be believed. Thus, whereas the lying rate of Meepers and Fleepers will be exactly equal on some neutral subject as what they ate for breakfast, it necessarily changes where the subject of the lie is an area where there is already an asymmetry between Meepers and Fleepers, because of the factor of expectation of belief.

Partly for these reasons, I doubt that probabalistic analyses, Bayesian or otherwise, are of any use in determing the truth or otherwise of past events. The allegation is true or false. There was tickling or there was not. (We needn’t be troubled by considerations at the margin such as misunderstandings, etc, for present purposes). Schrodinger’s cat does not apply at the macro level.

Future events,maybe (what is my chance, as a Meeper, of being tickled if I go out tonight?) And of course probability is useful in describing large numbers of cases (for purposes of resource allocation and the like).

But not past events. What 100 or 1000 or 1000000 other people in similar circumstances may have done has no bearing on whether a particular Fleeper or Meeper was a tickler/ticklee. The facts of past events have crystallised. The problem that we may not know reliably what those facts are is not one that can meaningfully be answered by probability analysis of different cases.

Different scenario. The vast majority of the people the police arrest are guilty. We know this from pleas of guilty, verdicts at trial, etc. When you add them up, in most jurisdictions, about 90%+ of people charged are convicted in most jurisdictions. Does that help us tell whether a particular suspect before a particular jury is guilty or not? Of course not. He is either guilty or not according to the facts of his case. If we don’t have those facts, then supposed probability analysis adds nothing. It provides an illusion of significance, but is in reality an error of the order of “looking for the sixpence under the lamppost”.

That may be so, but it’s off-topic. I asked a specific question and I’m looking for specific answers.

One can debate whether the example of Fleepers and Meepers is a bad model for reality, but the main point is that (1) background statistics matter; and (2) it is not inherently contradictory two have two sub-populations that are equally honest while at the same time, a particular claim made by a member of one of the populations is less likely to be true than the same claim made by a member of the other population.

Thanks psychloan. These Bayesian analyses are always a great way to distinguish those who grasp logical reasoning from those who just can’t quite.

Ah. My apologies. I thought the OP’s question was asked with a broader context in mind. And of course, it assumes the validity of application of the Bayesian approach in such circumstances. The philosophy underlying Bayesian analyses is not simply a matter of “logic”, nor is it uncontentious. But since that’s not the debate you were after, my bad (slinks into corner).

I think that we need to carefully distinguish our Bayes Bits:

  1. The Bayesian perspective in probability, which views probabilities in terms of subjective belief, rather than as relative frequencies.

  2. The use of Bayes Theorem, which makes use of conditional and prior probabilities to compute a contingent probability.

I don’t think we’re singing from different song-sheets: see (for want of a more formal cite)

Bayes - Wikipedia’_theorem

But the OP gets to decide what’s on-topic and pscyhloan has ruled the discussion out of bounds. So.

If I’m remembering Bayes’ Theorem correctly, it uses the definitions of conditional and joint probabilities:

Pr{A | B} = Pr{ A and B }/ Pr{ B}; Pr{B | A} = Pr{ A and B }/ Pr{ A}; so then
Pr{A and B} = Pr{A|B}*Pr{B} = Pr{B|A}*Pr{A}…

So if we have a probability space partitioned into a disjoint union S=Union {A(i): i in I}, and an event B, then we can say that

Pr{A(j)|B} = Pr{ A(j) and B} / Pr{B} = Pr{ A(j) and B } / [Sum over i in I Pr{A(i)andB}] = Pr{A(j)andB}/[Sum over i in I of Pr{B|A(i)}*Pr{A(i)] =
[Pr{B|A(j)}*Pr{A(j)}]/[Sum over i in I of Pr{B|A(i)}*Pr{A(i)]

Or briefly,

Pr{A(j)|B} = [Pr{B|A(j)}*Pr{A(j)}]/[Sum over i in I of Pr{B|A(i)}*Pr{A(i)].

Forgive me. I am not sure what point you are advancing here. I don’t doubt that Bayesians can derive by manipulation of mathematical objects things which they call “probabilities” (and it may be that the language here is the cause of, and solution to, all of life’s problems :stuck_out_tongue: ). I don’t doubt that much economic theory is built on Bayesian models. My point is one of legitimacy of application. Cribbing from Wiki again,

This doesn’t mean that the frequentists are wrong. It may be that limitations which constrain them are inherent, or that there is a subtle difference in the meaning of the word “probability” which each is using.

And it may be that the point of difference I am perceiving is cultural. I am a lawyer. Use of probabilistic language to apply to past events in the way described in the OP is (to me) the vice of “arguing from the general to the particular”. It is (again, to my mind) the same vice as arguing that any individual criminal is “probably” guilty a priori , because most people who are charged are guilty. Economists may have a different take, but I remain to be convinced that the maths adds any more than it would to a debate between an Einsteinian and a Newtonian, in which the Newtonian said, “But look! My equations are internally logically consistent! They give answers, according to the terms defined therein! They must therefore be true!”

For what it’s worth, I remember some years ago having professional dealings with heavy-hitter statisticians, and being told that most of them weren’t Bayesians. Times may have changed, but I gathered that the reasoning was something like that which I have tried to articulate above.

And that’s an interesting and important question. Which I believe belongs in Great Debates!!

Addressing the limitations of a model is a necessary part of statistical analysis, so no, that sort of discussion isn’t off-topic.

The important point that I haven’t seen addressed here is that you’re never assessing whether a meeper tickled a fleeper; you’re assessing whether this meeper tickled this fleeper. There will be other information present that outweighs any probabilistic analysis.

Further psychloan is using this thread to buttress up his claims in the thread that spawned this. If the conclusion he’s drawn is flawed because this isn’t the method to use to determine things like honesty, lying and credibility, then that limitation should be acknowledged… whether or not the math is correct.

It would IMO, be dishonest not to.

Hey, if any moderators happen to be reading this, could this discussion be moved to great debates? I believe that my original factual question has been answered, but other issues have arisen that are worth debating.

The likelihood that Fleeper Smith’s claim is false is no different than Meeper Brown’s. If they are both equally honest, both are equally likely lie or tell the truth.

Your conclusion should read thusly: in a representative sample of hard-copy Fleeper reports, what are the chances that a randomly selected one will be false. Is that probability larger or smaller than if that experiment is done with Meeper reports?

You are trying to apply probability backwards.

And there’s no reason to rule the points that Noel raised off-sides. If your assumptions are invalid then so will be your answer, since Bayesian reasoning is more than just making numbers add up to the right thing. If you only see this question in terms of math, if will be difficult for you to word your conclusion in a way that is logical. Hence your problems in the Pit thread.