Probabilities of Relevance

Let’s say that I am evaluating the probability that a person is guilty of murder. The person who the police have decided to prosecute is tall, was a friend of the victim, was out drinking the night of the crime, lived in or was in town the day of the murder, and suffered a sprained shoulder the night of the murder.

Further, I am told that these are the probabilities of guilt for each of these pieces of evidence:

3 in 4 times, the murderer will be someone who was drunk.

2 in 5 times, the murderer will be someone who was a friend (but not lover).

9 in 10 times, the murderer will be someone tall (not 10 in 10, because the CSI person may have misjudged something).

1 in 5 times, a wound like a sprained shoulder would occur during a murder as opposed to random chance / drunkenness / etc.

How do I compute my confidence rating that the tall friend was guilty? I.e., like 95% certainty, given a population size of about 100,000 people who would have been in town at the time, and a friend pool of about 30.

Not a homework problem.

I don’t think you can, at least not without additional information. The probabilities you list are not necessarily independent, and the probability that someone is guilty if they have those characteristics must not be confused with the probability that they have those characteristics if they are guilty.

Your example is reminiscent of a famous case, of which one discussion can be found here: The Professor, the Prosecutor, and the Blonde With the Ponytail

Just like the other new probability thread I’d start with the negative odds and multiply: 1/4 * 3/5 * 1/10 * 4/5 to get not guilty. But the issue of independence and other factors is going to seriously muss with this. So it’s at best a weak lower bound.

You have a probabilistic fallacy here. You cannot determine the probability of guilt of the individual based on the probabilities you have listed.

Let’s simplify the scenario to illustrate, and just say the the person who the police have in custody was out drinking the night of the crime.

If 3 out of 4 murderers were drunk, you can only determine the probability of a given murderer having been drunk, not of a drunk person being a murderer. It tells you absolutely nothing about the probability of the person in custody being a murderer.

Now let’s use reductio ad absurdum. Suppose that the four probabilities you listed were all 100%. Every murderer was drunk, platonic friend, tall, and injured. Mathematically, this tells you absolutely nothing about whether any given individual who was drunk, a friend, tall, and injured committed a murder.

P.S. The police don’t decide whom to prosecute.

In order to find out the actual probability you would have to know how many friends she had, how many people were drunk on the night, how many tall people are in the area, how many people suffered a sprained shoulder on that night. Then use those numbers to create a probability for each.

This. If 75% of the time the murderer is drunk, but in general in the area at the time 90% of people are drunk, then it is the sober people you should be suspicious of.

Even with all those obviously needed (zeroth order) probabilities, you would still want, as Thudlow Boink implies, (first order) covariances — e.g. a drunk murderer might be more likely to sprain his shoulder than a sober murderer; the victim may have few tall friends. And 2nd-order, and 3rd-order stats as well, for that matter.

Whatever model you finally come up with, the probability you compute will be a reflection of your ignorance as well as your knowledge.

Nit pick. This is not quite true. It tells you that the given individual has a non-zero probability of being the murderer, whereas if any of those characteristics failed, the probability would be zero. That is, each characteristic could prove innocence but even together they could not prove guilt without futher information.

Ironically, this “not a homework problem” sounds like something I read in my introductory statistics textbook at college (although it wasn’t a “homework problem,” but an example of comparing non-independent events); at a trial, the prosecution listed the probabilities of a person in the general public having each of a number of particular traits, and when they were multiplied together, concluded that only one person could have all of them, and “coincidentally” the defendant did. The jury convicted, but an appeals court overturned it, in part because some of the traits were clearly not independent.

You may also have to look at it both ways - the claim is that 3/4 of murderers are drunks, but what fraction of drunks are murderers?

What would be the math for that?

That Don Guy, the case you’re referring to is probably the same one linked by Thudlow Boink.

If you assume that these probabilities (all 100% as in my own example) are from an omniscient being and are absolute and apply to all cases past, present, and future, then, yes, if there are 10,000 people who have all four characteristics then the probability of any one of them to be the murderer is 0.01%. If someone is not in that group then the probability for that person would be 0.

However, you can never have such information. Given only the information in the OP, these probabilities seem to be historical. Therefore they cannot be used to develop the probability that any given suspect having all four characteristics is the murderer.

If the OP is an attempt to idealize a situation as a probability and statistics exercise then I would probably suggest a different scenario.