PDA

View Full Version : Statistics & profiling problem


sailor
01-01-2004, 10:12 AM
Reading a thread going on in GD this question occurred to me.

The town of X has a population of 10,000 of which 10% black and 90% white.
Of blacks, 20% are criminals and 80% are honest while of whites 5% are criminals and 95% are honest.
Honest Bad Total
Whites: 8550 450 9000
Black: 800 200 1000
Totals: 9450 650 10000 The bad guys are about equally bad so that about 70% of crimes are committed by white bad guys and 30% by black bad guys.

A crime has been committed and an officer arrives on the scene and sees a white guy and a black guy leaving the scene in opposite directions. He cannot go after both so he has to choose which one to go after.

Hmmm, he thinks, 70% of crimes are committed by white guys which is more than twice the percentage of crimes being committed by black guys. I should therefore go after the white guy.

Oh, but wait, taken individually the white guy has a probability of 5% of being a criminal but the black guy has a probability of 20% which is 4 times higher. I should therefore go after the black guy.

What is the true, correct and definitive answer? Who should the officer chase?

don't ask
01-01-2004, 10:18 AM
He should wound one in the leg and then chase the other.

ultrafilter
01-01-2004, 11:47 AM
This looks like a problem that can be addressed by Bayes' theorem (http://mathworld.wolfram.com/BayesTheorem.html), but I'm not 100% sure how to apply it here.

Orbifold
01-01-2004, 01:38 PM
First the math, then the explanation of why the math is probably not applicable anyway.

So: mathematically, assuming that the black man is randomly selected from the set of all black people in the town of X, then the probability that he is a "bad guy" is 20% as you've stated. So in that sense, the black guy is more likely to be a "bad guy" than the white guy (assuming that the white guy is a randomly selected white guy as well). The fact that there are more white bad guys than black ones means that if you select a bad guy at random then he's more likely to be white than black, but that's not what's happening in the situation you describe.

But of course, these people aren't randomly selected are they? It's not as if the census bureau picked one white man and one black man at random and dropped them at the scene. So statistical conclusions that assume the people are randomly selected, like the one in the previous paragraph, are of questionable merit at best. For all the cop knows there's a secret white bad guy convention going on down the street.

The cop really has no information to suggest which suspect is more likely to be a criminal, without making an unfounded statistical assumption.

robcaro
01-01-2004, 03:20 PM
Well, while the cop is doing the math, both of them will get away. I suggest that we should agree with don't ask. Too bad that we have a racial system in the US. Cops shouldn't be asked to judge between black or white.

Napier
01-01-2004, 03:26 PM
>Hmmm, he thinks, 70% of crimes are committed by white guys which is more than twice the percentage of crimes being committed by black guys. I should therefore go after the white guy.

I think this is a red herring. It would have influenced how likely it would be that one white and one black are leaving the scene, but that's a given and so the statistic is not useful here.

>Oh, but wait, taken individually the white guy has a probability of 5% of being a criminal but the black guy has a probability of 20% which is 4 times higher. I should therefore go after the black guy.

I think this is the relevant statistic. It's also the correct conclusion, if we're given that the cop should take advantage of the statistical information and also that he has no other criteria to use. These aren't necessarily trivial points.

>The cop really has no information to suggest which suspect is more likely to be a criminal, without making an unfounded statistical assumption.

I think this isn't correct, in two ways. The statistical information is certainly "information to suggest which suspect is more likely to be a criminal", though it is certainly weak information that still leaves the criminal's identity quite uncertain. Weak information is certainly more useful than none at all. And, I don't hear anything about assumptions here, founded or not.

BTW, sailor, you're not afraid to ask the tough questions, are you? Perhaps we should substitute "army men" for "white men" and "navy men" for "black men". After all, why offend anybody needlessly?

Omphaloskeptic
01-01-2004, 03:54 PM
It's not quite right to suggest that just because you don't know everything you can't know anything. In the absence of complete information you can only make guesses, of course, and these guesses will sometimes be wrong; but this is a far cry from having no information at all. Using probability theory is one way of trying to make better guesses: i.e. guesses which are, statistically, more likely to be correct. Of course this relies on various assumptions (e.g. statistical independence between various random variables), and if your assumptions are badly wrong then you might make worse guesses instead.

A Bayesian might analyze the situation as follows:

Initially the officer sees two men, X and Y, running from the scene, and assigns each of them equal prior probabilities of being the criminal:
    P(Y,¬X) = 1/2        (prior probability that Y is criminal and X is not)
    P(X,¬Y) = 1/2        (prior probability that X is criminal and Y is not)
As he approaches, he sees further that X is black (Xb) and Y is white (Yw). He now updates his priors with the new information, using Bayes' Law (linked by ultrafilter above):
    P(Y,¬X | Xb,Yw) = P(Xb,Yw | Y,¬X) P(Y,¬X) / P(Xb,Yw)
    P(X,¬Y | Xb,Yw) = P(Xb,Yw | X,¬Y) P(X,¬Y) / P(Xb,Yw)
The denominator (basically a normalization factor) is computed by summing over all possibilities:
    P(Xb,Yw) = P(Xb,Yw | Y,¬X) P(Y,¬X) + P(Xb,Yw | X,¬Y) P(X,¬Y) .

Now how can we compute P(Xb,Yw | Y,¬X) (the probability that X is black and Y is white, given that Y is criminal and X is not)? We might assume (in the absence of more comprehensive statistical information) that the probability that X is black does not depend on whether Y is criminal, i.e. that the actions of X and Y are basically independent of each other. In this case we can write
    P(Xb,Yw | Y,¬X) = P(Xb | ¬X) P(Yw | Y) .
Now P(Xb | ¬X) (the probability that X is black, given that he is not a criminal) and P(Yw | Y) (the probability that Y is white, given that he is a criminal) are given in the statistical tables provided:
    P(Xb | ¬X) = 800/9350 = 16/187 [note typo "9450" in table]
    P(Yw |   Y) = 450/650   = 9/13
So
    P(Xb,Yw | Y,¬X) =   (16/187) (9/13) = 144/2431
and similarly
    P(Xb,Yw | X,¬Y) = (171/187) (4/13) = 684/2431
so
    P(Xb,Yw) = (144/2431)(1/2) + (684/2431)(1/2) = 414/2431
and the updated posterior probabilities are

    P(Y,¬X | Xb,Yw) = (144/2431)(1/2) / (414/2431) =   72/414 =  4/23        (posterior probability that Y is criminal and X is not)
    P(X,¬Y | Xb,Yw) = (684/2431)(1/2) / (414/2431) = 342/414 = 19/23        (posterior probability that X is criminal and Y is not)

This approach, of updating a priori probabilities to reflect new information using Bayes' Law, is called Bayesian inference. It's an extremely useful statistical technique, though (as with all statistical techniques) it relies on having valid data and assumptions.

Depending on how you define "better," it may be appropriate to consider factors besides the probabilistic results; some of these other factors come into arguments against profiling. (The game-theoretic aspects of policing, for example, mean that the actions the officer takes in this round may affect the approaches taken by the parties in future rounds.)

Orbifold
01-01-2004, 04:59 PM
Originally posted by Napier
And, I don't hear anything about assumptions here, founded or not.

The assumption I was referring to was the statistical assumption that the men at the scene were randomly selected from the set of all such men in the city. It's unstated, but it's not possible to compute probabilities (such as Omphaloskeptic has done) without making some such assumption, as Omphaloskeptic has correctly noted.

Omphaloskeptic
01-01-2004, 05:31 PM
Originally posted by Orbifold
The assumption I was referring to was the statistical assumption that the men at the scene were randomly selected from the set of all such men in the city. It's unstated, but it's not possible to compute probabilities (such as Omphaloskeptic has done) without making some such assumption, as Omphaloskeptic has correctly noted. Well, yes, but I think your post was unnecessarily pessimistic withThe cop really has no information to suggest which suspect is more likely to be a criminal, without making an unfounded statistical assumption.Of course all assumptions are mathematically "unfounded"—that's why they're assumptions; but some assumptions are more reasonable than others. If we require that the officer just throw up his hands in defeat, or maybe flip a coin, unless he can determine with certainty which one is the criminal, then we may as well disband the police forces. Even if he captured both suspects (like don't ask suggests) and questions them both, and one admits guilt and the other protests his innocence, ... well, maybe they're both good liars. He still has "no information" unless he makes the (unfounded) assumption that they don't have some bizarre reason to conspire to fool him. I think it's reasonable to consider this last scenario rather unlikely, though, and more generally, reasonable to make inferences based on statistical information. Mathematical certainty is never going to be possible here; all you can hope to do is use all the information you have and try to quantify your errors.

drachillix
01-01-2004, 06:44 PM
Originally posted by don't ask
He should wound one in the leg and then chase the other.

So which one do you shoot in the leg?

ltfire
01-01-2004, 07:01 PM
Originally posted by drachillix
So which one do you shoot in the leg?

Well. the black one, of course. It's well known that they are bred to run faster.
:eek: :)

sailor
01-01-2004, 07:39 PM
Omphaloskeptic, the more I think about it the more confused I am and I arrive at contradictory results but none are what you say.

One analysis: The cop arrives on the scene and sees a crime was committed. He sees no one and thinks correctly that the chances are 70% that it was a white bad guy. Now he sees the black and the white guy. This yields no new information so he has better chances of catching the criminal if he goes after the white guy. Right now I think this is correct.

Omphaloskeptic
01-01-2004, 08:24 PM
Originally posted by sailor
Omphaloskeptic, the more I think about it the more confused I am and I arrive at contradictory results but none are what you say.

One analysis: The cop arrives on the scene and sees a crime was committed. He sees no one and thinks correctly that the chances are 70% that it was a white bad guy. Now he sees the black and the white guy. This yields no new information so he has better chances of catching the criminal if he goes after the white guy. Right now I think this is correct. No, the presence of these two people at the scene does in fact yield new information: that they are the most likely suspects. Before you saw these two, your list of suspects was the 10000 people in the city, each with probability 0.01%. Now it's (simplifying by pretending these are the only two suspects left) just these two, each with (for now ignoring the information about their race) 50% probability.

Beforehand, your suspicion that it was probably a white guy was weighted by the fraction of white suspects (large relative to the fraction of black suspects). But since they are no longer suspects, they (along with the rest of the 9998 other people in town) don't weight the results any longer. What's important now is how likely each of these two individuals is to be a criminal.

drachillix
01-01-2004, 08:37 PM
Originally posted by sailor
One analysis: The cop arrives on the scene and sees a crime was committed. He sees no one and thinks correctly that the chances are 70% that it was a white bad guy. Now he sees the black and the white guy. This yields no new information so he has better chances of catching the criminal if he goes after the white guy. Right now I think this is correct.

Or another analysis which if I understand Omphaloskeptic's post correctly.

With no other information other than a crime scene, you would assume it was 70% likely a white suspect because whites commit 70% of the crimes. However since the presence of a black person is noted from among the two individuals leaving the scene and by your stats any given black guy is about 4 times more likely to commit a crime as any given white guy.

This seems to correlate with the results of 4/23 chance of white perp vs 19/23 chance of black perp.

sailor
01-01-2004, 09:02 PM
I'm going to have to think about that. I say this because I remember stumbling over another probability problem in another thread long time ago. It was the Monty Hall 3 door problem. I argued adamantly and suddenly, a little light turned on and I realized I was wrong. But it took quite some time and effort. This time I am going to be more prudent, especially because I am even less sure this time. Let me give it some thought.

KP
01-01-2004, 09:19 PM
Actually, there are a whole passel of assumptions here. Please indulge me as I toy with some of them. If you just want the "meat" [genuine objection] skip to the bolded "Final Analysis".

ISSUE #1: you presume "Criminals" are always guilty, and "Honest men" never are - all criminals are born guilty, and all honest men are forever sainted. This is necessarily false, and weakens the derivation above. In real life, of course, all men are born innocent, and some become guilty at some point. Without knowing the rate of (convicted) first offenses - the rate at which honest men become criminals, we can only guess (or approximate) the actual probabilities

ISSUE #2: what do real cops think if they see two men running away from the scene of a crime? They think both are guilty! I'll get back to this point

ISSUE #3: If all guilty men are "always guilty" then we get some truly funky situations. If a penny is stolen from the till, every criminal in the vicinity must be guilty - a neat trick, and a ludicrous assumption for the real world.

You may argue that the unstated principle ('assumption' is more like it) that only one is guilty is implicit in the formulation of this problem (and similar problems of its class) But that forces a revision of the math.

You may say "the crime was murder, and only one gunshot was heard" (we'll ignore the crime of 'fleeing the scene'- which only casts more doubt on the presumption that "only criminals commit crimes") We still have to refine the derivation.

If both men were chosen at random, then a Real Cop's initial assumption (both are criminals for fleeing the scene) is correct 1% of the time, and the "reader's assumption" (there is only one criminal) is correct 23% of the time - but 76% of the time there'd be no victim at all!

The apparently relevant denominator ambiguous. it's either
A) "all cases where at least one man is a criminal" (which I think is the only physically arguable case); or
b) "all cases where only one man is a criminal" (whose only merit is conforming to a common presumption)

Pa = 450/9000 + 200/1000 - [(450/9000) * (200/1000)] = 24%
(the subtracted term removes the 'double-counted' overlap)
Pb = 450/9000 + 200/1000 - 2*[(450/9000) * (200/1000)] = 23%
(the subtracted term removes *both* counts of the overlap)

In either case, we can only rely on the prevalence of "guilty" men in each racial subgroup 10% (B) vs 5% (W) - but the analysis is completely flawed, because in a random matching of candidates the actual murderer would be White roughly 450/650 of the time. Why were we wrong?


FINAL ANALYSIS
--------------------
The flaw is: crimes are committed solely by criminals. Any number which includes innocent people merely obfuscates the issue. This includes statistics like "percentage of criminals" or "total population", which are affected by the number of innocents. changing the number of innocents does not affect the probability that a man is guilty.

The statistically valid denominator is the number of potential murderers in town . All the Chinese in China, or all the innocent Chinese in town are irrelevant.

Issue 4: the problem was set up to demand a black man and a white man at the scene.The universe of random pairings, howeverdoes not reflect the underlying events each black man, guilty or innocent, is "forced to flee" 9-10.6875 times as many hypothetical crime scenes as each white man. This dramatically overestimates the possibility of black guilt

EFFECT 1: In the universe of of cases where a black man is guilty, the analysis provided pairs him against 9000 white men [case A] or 8550 innocent white men [case B] while each white man is mathematically paired against only 1000 or 800. If you wrote a chart every "random pairing, each black man's name would appear either 9000 or 8550 times, while each white man's name would only appear on the chart 1000 or 800 times. Throw a dart at this chart, and the result isn't "fair" - the black men have 9000/1000 or 8550/800 times as many slips in the hat.

To illustrate this,make the numbers more extreme. Say that there are only 10 blacks in the city (and 1 black criminal) while there are 10 million whites (and 50,000 white criminals). By (improper) Bayesian analysis, that black man must commit virtually all the crime that occurs in his vicinity, while the 50,000 white criminals sit on their hands. With a year (before new stats can be issued, every black person would be shot many times, and no white man would be caught in this situation.

EFFECT 2: Relying on prevalence in subpopulations will skew all future statistics, even if the population sizes are equal. In cases where men of both races are suspected, the black criminals will always be caught [and be counted], and the white criminals will always escape [and not be counted]. This effect is strongest as the population with the highest pre-existing

Effect 1 increases as the black fraction of the total population decreases. Effect 2 increases as the black fraction increases. Racial profiling is a Big Lose for blacks, guilty or innocent.

sailor
01-01-2004, 09:59 PM
Well, at least the rest of us understand the problem as it was enunciated even if we're still having problems finding or understanding the solution.

Omphaloskeptic
01-01-2004, 10:07 PM
Originally posted by KP
EFFECT 2: Relying on prevalence in subpopulations will skew all future statistics, even if the population sizes are equal. In cases where men of both races are suspected, the black criminals will always be caught [and be counted], and the white criminals will always escape [and not be counted].This is true, and it is undeniably a problem with profiling schemes. It would be nice to have an equal chance of catching all criminals (whether black or white) and to prevent white criminals from gaming the system by committing their crimes when blacks are around (thus hassling innocent blacks), but it would also be nice to maximize our chances of catching criminals. These two desires are in conflict here, and there's no simple resolution; they can't simultaneously be maximized for the problem stated.

There's a reason my first response stopped where it did (with the computation of the posterior probabilities) and not with a complete answer to the OP's question Originally posted by sailor
What is the true, correct and definitive answer?which is, as you point out (I only mentioned it in passing) a much more difficult question, and not one with a GQ answer. What do you want to maximize?

Originally posted by KP
Racial profiling is a Big Lose for blacks, guilty or innocent. Well, yes, but black crime is (in the real world; the statistics in the OP don't cover victims) also a Big Lose for blacks, guilty or innocent. It's not clear, to me at least, which causes worse problems in practice.

KP
01-01-2004, 10:12 PM
My point is that the problem, as it is enunciated, is flawed.

The "understanding" you cite is what creates the confusion. My final analysis points out why.

Sorry about the rest of the stuff, if it didn't interest you. I probably should have taken the racial example less literally, However, picking assumptions apart is not irrelevant. It is an essential first step in mathematical analysis. If this were a more strictly mathematical forum, the assumptions would have been picked apart a lot more already. That's just part of how the game of math (vs. arithmetic) is played

Omphaloskeptic
01-01-2004, 10:41 PM
Originally posted by KP
My point is that the problem, as it is enunciated, is flawed.

The "understanding" you cite is what creates the confusion. My final analysis points out why.

Sorry about the rest of the stuff, if it didn't interest you. I probably should have taken the racial example less literally, However, picking assumptions apart is not irrelevant. It is an essential first step in mathematical analysis. If this were a more strictly mathematical forum, the assumptions would have been picked apart a lot more already. That's just part of how the game of math (vs. arithmetic) is played I don't think the problem is necessarily flawed. If you wanted to use this Bayesian analysis as a Rigorous Proof of The Efficacy And Rightness of Profiling, well, that would be a problem, yes. But the question, at least as I understood it, was somewhat more limited in intent than that: just a tool for understanding the mathematical reasons behind profiling.

viking
01-02-2004, 12:06 AM
Then again, the problems raised so far can also be used to understand The Problems With Profiling.

The guys could be running from the scene because they are afraid of being wrongly harassed by the police. And in fact we would expect this to be more of a factor for the member of the group that profiling suggests we should harass. So, the person that profiling says we should harass is running because he knows he's going to get blamed, and the guy that profiling says we shouldn't harass is runnig because he's guilty. So the game theory approach to the problem says we should catch the guy that we don't think we should catch.

And suddenly the reasoning starts reminding me way too much of The Princess Bride

DanBlather
01-02-2004, 12:26 AM
Luis Tiant, a Cuban born pitcher for the Red Sox, was stopped for jogging in the predominantly white neighborhood in which he lived. The cop's reasoning was that a black man running in that vicinity must be fleeing from something.

sailor
01-02-2004, 08:33 AM
Please! This is not a discussion about racial profiling so I would appreciate it if those wishing to discuss racial profiling would take their discussion to another thread. My question is 100 percent a probability problem and the relevant data is given in my OP. The question is: given the information the cop has, what can he infer with regards to who is the most likely suspect? That is the question. The fact that in the real world there is no town named X, or other similar real-world considerations, are totally outside the problem I am presenting. I am interested in learning about statistics, not about the racial problems of America.

Napier
01-02-2004, 09:21 AM
>A crime has been committed and an officer arrives on the scene and sees a white guy and a black guy leaving the scene in opposite directions. He cannot go after both so he has to choose which one to go after.


>This is not a discussion about racial profiling

sailor, you strain me!

sailor
01-02-2004, 09:54 AM
Q. Delta flight #45678 leaves Washington DC and flies towards its destination located 6000 miles away at a speed of 2000 mph. How long will it take to get there?

A. There is no Delta flight #456789
There is no airport located in the District of Columbia
No commercial airliner can fly at 2000 mph

You think that would be considered the correct answer in most schools and colleges? :rolleyes:

It seems you cannot mention race, guns, abortion, religion and a number of other hot issues, no matter how in passing, without the thread being hijacked to hell by people who want to argue their pet subjects. Please take it to another thread. This is about mathematical probabilities.

hroeder
01-02-2004, 10:37 AM
Common sense math answer that will surely infuriate mathematicians:

There's a 9% chance that of the two men one will be white and one will be black.

So we now know that this combination is rare in this town.

Calculating the probability that the Black dude will be a criminal gives a 1.8% possibility looking at the entire population; or a 2.9% possibility looking at the criminal population.

Calculating the probability that the White dude will be a criminal gives a 4.5% possibility looking at the entire population. It gives a 5.4% possibility looking at the population of criminals.

Thus about 5-2 odds that chasing the White dude will get the cop a collar, given no other information.

pjd
01-02-2004, 10:48 AM
ALL RIGHT, ALL RIGHT STOP IT !

IT WAS ME, OK ?

I admit it.

sailor
01-02-2004, 11:08 AM
hroeder, you have thrown in a new perspective which had not occurred to me and which makes sense but I think you may have the numbers wrong. Let me do some math and see what I get.

sailor
01-02-2004, 11:16 AM
A) The white guy did it. Then the black guy can be *any* of the 1000 black guys in town, good or bad.

B) The black guy did it. Then the white guy can be *any* of the 9000 white guys in town, good or bad.

So, we have 9000 cases in which the black guy did it versus 1000 cases in which the white guy did it. The reasoning seems correct to me and yet the conclusion seems wrong in that the process does not even take into account the numbers of bad guys in each group. I assume the process is wrong but I cannot put my finger on it. Someone?

sailor
01-02-2004, 11:26 AM
I still can't put my finger on it but it seems more reasonable to use strictly the numbers of bad guys in which case the probability is 70% that the white guy did it. I feel quite confident with that.

At least I suppose that is a valid first analysis. Then I suppose it could be refined but I still doubt the result got by Omphaloskeptic that the probability is only 17% that the white guy did it. I think there *has* to be some error along the way.

The answer may be neither number but right now, if I had to pick between 17% 1nd 70% I would pick 70% that it was the white guy. I would also bet that the correct answer is not higher than 70% but not as low as 17% .

KP
01-02-2004, 11:49 AM
Hroeder. I largely agree, but with a significant distinction.

There are 450 white criminals and 200 black criminals. With no data on any differences in extent of the criminal records of black vs. white criminals, we can only consider them equally "criminal", making the odds 450:200 (69.23%) that any crime is commited by a white.

As I said earlier (but apparently, not well): the number or race of innocent people is irrelevant. By the assumptions of the problem, if the entire town were present at a speech and the speaker was assassinated, you'd have 650 suspects, not 10,000. If 1000 (honest) FBI agents were present, there are still 650 suspects, not 11,000. If the town's 1000 Quakers, black and white, are never criminals, and didn't attend, there are still 650 suspects, not 9000. If "all criminals are equal", then there's a 69.23% chance the assassin is white, even if *everyone* flees the scene in fear.

If ONLY criminals commit crimes, only criminals are relevant. It's always possible to calculate the criminal %age for any group (e.g. by last digit of SSN), but that merely compares the irrelevant innocents to the relevant criminals.

The population of China is also innocent of this crime, should we use the ratio of black criminals to native Chinese to decide between the black and white suspect? The fact is: the ratios of black and white criminals to native Chinese actually gives the right answer! Why, because we're using the *same* denominator to compare both sets of criminals. "Innocents of the same race" or "total population of same race" are inappropriate, skewing denominators.

sailor
01-02-2004, 11:59 AM
Yes, I think KP's analysis is right. At least initially the odds are 450/200 that a white guy did it and that is the conclusion if the officer arrives at the scene and sees no one.
The question is whether seeing a white guy and a black guy adds any relevant information which would alter the odds and that is where I am not clear but right now I can't see how it does.

If it did it would lower the figure somewhat but I can't see hou it could lower it to anything close to 17%, even 50% would seem a stretch.

Omphaloskeptic
01-02-2004, 02:03 PM
OK, let's change the parameters and see if the conclusion (that the odds are 450:200) makes sense.

Here are the original statistics (corrected): Honest Bad Total
White: 8550 450 9000
Black: 800 200 1000
Totals: 9350 650 10000
Now let's consider an extreme case (Example 2 for reference): Honest Bad Total
White: 9350 450 9800
Black: 0 200 200
Totals: 9350 650 10000Notice that I have not changed the "Bad" column at all. I've only changed the 800 honest blacks to whites; in our new fictional town of East X there are no honest blacks. Is it still the case that the odds are 450:200 that a white guy did it, even though the black guy is a guaranteed criminal?

I can make this even more extreme by adding honest whites (Example 3): Honest Bad Total
White: 999350 450 999800
Black: 0 200 200
Totals: 999350 650 1000000In the bustling metropolis of Lower East X, with a white criminal proportion of 0.045% and a black criminal proportion of 100%, there are still 450 bad whites and 200 bad blacks. Is it still 450:200 for the white guy?

I'm trying to come up with a more intuitive example, but (to me at least) these two extreme cases make the 450:200 odds seem very unreasonable. Let me try another explanation of the new information: You have two populations (whites and blacks). The relevant feature of these populations is that the smaller population has a higher proportion of criminals. You see two suspects at the scene, and (by the tacit assumption in the problem) exactly one is guilty. The fact that exactly one of the suspects from the smaller, higher-crime population is at the scene is relevant because it is a relatively unlikely occurrence.

Omphaloskeptic
01-02-2004, 02:12 PM
Originally posted by KP
As I said earlier (but apparently, not well): the number or race of innocent people is irrelevant. By the assumptions of the problem, if the entire town were present at a speech and the speaker was assassinated, you'd have 650 suspects, not 10,000. If 1000 (honest) FBI agents were present, there are still 650 suspects, not 11,000. If the town's 1000 Quakers, black and white, are never criminals, and didn't attend, there are still 650 suspects, not 9000. If "all criminals are equal", then there's a 69.23% chance the assassin is white, even if *everyone* flees the scene in fear.

If ONLY criminals commit crimes, only criminals are relevant. It's always possible to calculate the criminal %age for any group (e.g. by last digit of SSN), but that merely compares the irrelevant innocents to the relevant criminals.

The population of China is also innocent of this crime, should we use the ratio of black criminals to native Chinese to decide between the black and white suspect? The fact is: the ratios of black and white criminals to native Chinese actually gives the right answer! Why, because we're using the *same* denominator to compare both sets of criminals. "Innocents of the same race" or "total population of same race" are inappropriate, skewing denominators. The reason that the innocent populations are not irrelevant here is that there are two people at the scene, one guilty and the other innocent. This is statistically more likely if the guilty man is from a more-guilty population, but also if the innocent man is from a more-innocent population.

(The innocent population of China, etc., is irrelevant to the question not because of its innocence but because it is not part of the problem universe. Charles Manson is also irrelevant to the question for the same reason, even though he's guilty.)

sailor
01-02-2004, 05:35 PM
I have to admit I am totally perplexed by what seems such a simple probability problem. Several analysis all seem correct to me and yet they cannot be all correct as they lead to contradictory results. Then I can find fault with all. As i say, i am perplexed.

KP
01-02-2004, 08:08 PM
Oomphaloskeptic makes a valid point. I'm not sure I agree with his explanation, but his numeric argument is solid.

I've played with some numbers, and it appears that there is some effect beyond the population size effect [Effect #1 in my original post] that seems to be nonlinear - perhaps a factor of x/(1-x) - "the ratio between empty space and filled space"in a fixed size container. Such equations behave differently when compared quantities are near each other ["in the same regime"] vs. when they are far apart.

I overlooked this factor in the original example where both quantities were in the same regime. I'll need to work out a single unified equation to understand which regimes various factors predominate. I'll report back then.

But again: Oomphaloskeptic is right: I was wrong to the number of innocents is completely irrelevant. (we all know what heppens when people If I ignore, bury or forget their kistakes, so I try to emphasize mine.)

Achernar
01-02-2004, 08:23 PM
I don't know Bayesian math, but it seems to me that whatever solution is correct must fulfill the following criteria, if B is the fraction of blacks which are bad and W is the fraction of whites which are bad, regardless of the overall populations:
If B = W, then color and honesty are independent variables, and so it's exactly 50/50.
If B = 0, then it's 100% certain that the cop should pursue the white one, and if W = 0, then it's 100% certain that the cop should pursue the black one.
If B = 1, then it's 100% certain that the cop should pursue the black one, etc.
If B = W = 1 or B = W = 0, then the best choice is undefined.
Does anyone disagree with these? Because some of the solutions so far contradict them.

sailor
01-02-2004, 09:42 PM
Achernar, in simple terms what you are saying is that the cop look at the probabilities that each individual is bad. Your assertion makes sense from a certain point of view and this is the problem I am having: different POV yield different answers. Look:

The cop arrives and thinks of the crime. Clearly the probability that it was committed by a white guy is 70% because 70 % of crimes in the city are being committed by white guys. So he should chase mr White. The logic seems irrefutable.

But now he forgets about the crime when he sees the two guys and thinks (as you point out) the probability that a black individual is a bad guy is four times higher than the probability that a white guy is bad. So he should chase Mr Black.

Both reasonings look sound to me and yet they can't both be true because they are contradictory.

Probably the correct answer to the probability of the white guy being who did it lies somewhere between the upper 70% and the lower 20% which each scenario yields. But that is still a huge range and I do not know what logical process would combine both factors to give the correct answer.

ultrafilter
01-02-2004, 10:04 PM
Originally posted by Achernar
I don't know Bayesian math, but it seems to me that whatever solution is correct must fulfill the following criteria, if B is the fraction of blacks which are bad and W is the fraction of whites which are bad, regardless of the overall populations:
If B = W, then color and honesty are independent variables, and so it's exactly 50/50.
If B = 0, then it's 100% certain that the cop should pursue the white one, and if W = 0, then it's 100% certain that the cop should pursue the black one.
If B = 1, then it's 100% certain that the cop should pursue the black one, etc.
If B = W = 1 or B = W = 0, then the best choice is undefined.
Does anyone disagree with these? Because some of the solutions so far contradict them.

Case 4 is contained in case 1, isn't it?

Achernar
01-02-2004, 10:11 PM
Well, as I wrote it, case 4 contradicts specific cases of 1, 2, and 3. But I meant for it to supercede them.

Omphaloskeptic
01-02-2004, 10:18 PM
I'm trying to write a more detailed explanation of my original (Bayesian) answer, but for now let me try to explain why I think your 70% solution is wrong:
Originally posted by sailor
The cop arrives and thinks of the crime. Clearly the probability that it was committed by a white guy is 70% because 70 % of crimes in the city are being committed by white guys.This is fine so far. In the absence of any other information, his universe of suspects is all criminals in town, ~70% of whom are white.
So he should chase mr White. The logic seems irrefutable.This is where I think the problem lies. Once he sees the two people running from the scene, these are his two suspects (in reality, these would be "primary" suspects and there would still be some suspicion cast on criminals not at the scene, but I've ignored this). He's not choosing any more whether to chase all 450 white criminals or all 200 black criminals, just whether to chase this one particular white (who may be a criminal) or this one particular black (who may be a criminal).

I hesitate to bring up a different example, because I think analogies usually just confuse the issue, but here's my attempt (NB: this is primarily aimed at the specific problem I see with the reasoning above, and not as a complete analogy):
I have a large number of nickels and dimes, all minted in either 2001 or 2002; 20% of the nickels and 5% of the dimes are dated 2001. From this stock I (randomly) pick 30 dimes and 3 nickels, placing them face down on the table in front of you. I allow you to turn over either all the dimes or all the nickels. If you want to find a 2001 coin, which should you choose?
Clearly you turn over the dimes; though each individual dime is less likely to be a 2001, there are ten times as many, enough that there is more likely a 2001 dime on the table than a 2001 nickel.

Now I (randomly) remove all but a single nickel and a single dime from the table. I peek at their heads and tell you that exactly one is a 2001 coin. Which coin do you turn over to find the 2001?

ultrafilter
01-02-2004, 10:19 PM
If B = W = 1, then both men are criminals, so chasing either man could result in catching a criminal, if not the perpetrator of this crime. So the best choice is to chase one of them--perhaps the one who is running more slowly.

If B = W = 0, then neither man is the criminal, but may be a witness. Again, the best choice is to chase one of them.

So it looks to me like case 1 contains case 4, and cases 2 and 3 should be taken to exclude B = W.

But I do agree that any solution must have this property.

Achernar
01-02-2004, 10:36 PM
Okay, I agree ultrafilter. I was thinking that the cop assumed that exactly one of the two was bad, in which case Case 4 would contradict this assumption. But I realize now that this assumption is not necessary.

KP
01-03-2004, 05:06 PM
I haven't had a chance to fully work out the unified equation, but I did come up with a few thoughts I thought I'd throw out. (I'm still working on these, too)

The key concept here seems to be "the universe of (applicable) possibilities" or the "gross denominator". When making a chart or possibilities, you must pick the correct rows and columns, or "counting cells" will mislead you.

As I earlier noted, making a matching grid of blacks vs whites produces a false bias. In such a grid, each additional white person increases the number of cells in each black criminal's row, but not any white criminal's column. But does a black man actually become "more probably guilty" if white person moves to town?

To remove any obscuring 'intuitions', let's rename the conditions. The town is a former all-boy's school. Blacks and whites are girls and boys. 'Criminals' are 'drivers'. The crime becomes an accident where a truck hits a car, killing the driver, but not the one passenger. We'll assume all dating is in-school and heterosexual.

Just as "only criminals commit crimes", only drivers drive, but a license doesn't prove you weren't a passenger. A criminal record doesn't prove guilt in any later crime.

In this example, it's easier to see that to assess the odds that a girl died, you don't multiply by the number of "available boys" (vs. the girls available to the guys) The date happened. There was ample opportunity for it to happen (the number of potential partners amply exceeds the number of drivers) Since it is not a limiting condition, you should leave it alone (Below, we'll see how Example C hits a limiting condition)

Statistical "opportunity" can be illusory: pedestrians aren't run over 10x as much in cities with 10x more roads; in fact the accident rate is often higher with fewer roads. Dating -or having innocents near your crime- usually isn't limited by the number of possible partners, so the effect of more potential partners isn't calculable. When you hear of the accident, knowing that the accident was on a date (vs. with a same sex friend) may your affect assessment on a person-by-person (not gender) basis

The fraction of licensed boys vs. girls (5% vs 20%) DOESN'T affect the probable gender of the victim. The fraction of drivers who are boys (69.23) vs girls (30.77) DOES.

"Licensure rate by gender" (criminality by race) is a sloppily framed statistic which could only be used if we felt we 'needed' to judge by raw gender, just as the original scenario was crafted to FORCE us to judge by race: the only answers we're allowed to give are "black" or "white"

An insurance company would go broke if they used "accidents per girl" instead of "Accidents per girl driver" to calculate rates. The scenario makes it sound like the cop MUST decide based on race, but in fact, he could chase the one who is closer, slower, wearing lighter colored clothing (easier to see at night), looks easier to subdue, is headed toward less concealing cover, or even choose one at random. A cop who sees two fleeing suspects and sees only race is a poor cop indeed.

Now let's remove race intuitions from Oomphaloskeptic's most extreme Example C:

It's a post-Apocalyptic future after a cruel bioweapon killed almost all women. By tradition, all women drive (at first, they didn't dare ride with a man!) but almost no men are allowed to drive (they might catch the few women). After a century or so, women are no longer afraid; they are worshipped and protected.

It's very rare to see a man alone with a woman (who are 1/5000th of the city). Yet one day, a paper reports that -horrors- a accident killed the driver of a car containing a woman. The whole city wants to know: did a woman die?

Like the original scenario, it is an unlikely event cherry-picked to make a point, but does the extreme rarity change the conclusion we established above?

No, it doesn't! While I agree that, this time, it was probably a woman who was killed (a black who was guilty), the 100% prevalence of driving among women (criminality among blacks) is actually quite irrelevant

The apparently contradictory finding of Example C is not caused by the HIGH 100% rate, but by the ULTRA-LOW rates: prevalence of women, and prevalence of driving males (small number of blacks, and almost total noncriminality of whites in Example C)

To see this, let's see how changing the 100% rate affects the "most probable outcome":

DROPPING BLACK CRIMINAL RATE FROM 100% to 0.05% DOESN'T AFFECT EXAMPLE C
EVEN AT RATES SO LOW THAT NOT ONE SINGLE BLACK CRIMINAL EXISTS

TOTAL CRIMINALS HONEST Racial Prevalence % CRIM
BLACK 200 200 0 1: 5000 100%
WHITE 999800 450 999350 1: 1.00020004 0.0450090018%
TOTAL 1000000 650 999350 0.065%

B+W: 200.05 B guilty: 199.96 W guilty: 0.09

TOTAL CRIMINALS HONEST Racial Prevalence % CRIM
BLACK 200 20 180 1: 5000 10%
WHITE 999800 450 999350 1: 1.0002000400 0.0450090018%
TOTAL 1000000 470 999530 0.047%

B+W: 20.086 B guilty: 19.996 W guilty: 0.09

TOTAL CRIMINALS HONEST Racial Prevalence % CRIM
BLACK 200 2 198 1: 5000 1%
WHITE 999800 450 999350 1: 1.0002000400 0.0450090018%
TOTAL 1000000 452 999548 0.0452000000%

B+W: 2.0896 B guilty: 1.9996 W guilty: 0.09

TOTAL CRIMINALS HONEST Racial Prevalence % CRIM
BLACK 200 0.2 199.8 1: 5000 0.1%
WHITE 999800 450 999350 1: 1.0002000400 0.0450090018%
TOTAL 1000000 450.2 999549.8 0.0450200000%

TOTAL CRIMINALS HONEST Racial Prevalence % CRIM
BLACK 200 0.1 199.9 1: 5000 0.05%
WHITE 999800 450 999350 1: 1.0002000400 0.0450090018%
TOTAL 1000000 450.1 999549.9 0.0450100000%

B+W: 0.18998 B guilty: 0.09998 W guilty: 0.09
As you can see, the "criminality of blacks" is irrelevant in Oomphaloskeptic's Example C. Even when there is not one single black criminal, but 450 white criminals, the ultra-low white rate makes black men "the likeliest candidate". I don't know what "0.1" black criminal is, but it's way less than "one single criminal".

Apparently even "bad thoughts" by a black man should affect a cop's decision of whom to chase, more than 450 actual White criminals, if the "white crime rate" is low enough.

Such 'small number effects' are non-linear enough to constitute a deliberately skewed sampling: e.g. a tiny black population so small deprives them of the "statistical benefit" of "B/B" scenarios. At a black racial prevalence of 1:5000, the B/B effect is 0.00000004 while the 99.98% of white crime is buried in W/W scenarios.

This makes Example C so sensitive to black misdeeds, and so forgiving of the prospect of white misdeeds that it actually says you should arrest the black when there isn't a single black criminal, but there are hundreds of white criminals.

In fact, under Example C you could raise the White criminal rate to 100%, and the answer still wouldn't be "arrest the white man".

Interestingly, The most extreme case of an "Example C" scenario is "the only Chinese in town," if that person has a criminal record, Example C says he should be chased because his visible ethnicity has 100% criminality. Yet, in reality, the policeman should chase the white suspect: he can pick up the Chinese man later, but the white suspect is still unidentified.
What is the true, correct and definitive answer? Who should the officer chase?
Just because a quantity can be calculated, doesn't make it a sufficient basis for a decision. Personally, I'm more inclined to 'follow the math' than any other single factor, but I encounter situations daily when the math simply fails to provide the best solution in cases of limited information.

I find this problem interesting mathematically, but I think there is substantial reason to say that the cop's knowledge of racial statistics is no more relevant than a thousand other details he would have also seen. Some cops are known for giving speeding tickets to sports cars or even just "red" cars, but despite studies indicating that these are worse offenders, I would argue that targetting this "high risk population" would be a poorer global practice than ticketing 'at random'.

[I put red in quites, because I don't have a cite on speeding rates by color.]

sailor
01-03-2004, 07:15 PM
>> Just because a quantity can be calculated, doesn't make it a sufficient basis for a decision.

Well, I may agree but I am not having to make any decision. I'd just like to know the answer. I have been stumped by probability problems before but this has to be in lesson 1 of Probability 101. it is the simplest probability cae you can imagine, with just two variables. I'm pretty sure there's a correct answer hidden in there somewhere.

ultrafilter
01-03-2004, 08:24 PM
I'll agree with Achernar that the limit cases should behave as he specified, so that provides a guide for ruling out some solutions.

Achernar
01-03-2004, 08:31 PM
Originally posted by sailor
Achernar, in simple terms what you are saying is that the cop look at the probabilities that each individual is bad. Your assertion makes sense from a certain point of view and this is the problem I am having: different POV yield different answers. Look:

The cop arrives and thinks of the crime. Clearly the probability that it was committed by a white guy is 70% because 70 % of crimes in the city are being committed by white guys. So he should chase mr White. The logic seems irrefutable.I just thought of a way to refute this. Suppose 10% of people are Scorpios, and, no surprise, 10% of crimes are committed by Scorpios. The cop shows up at the scene, and using this logic, concludes that it's 90% likely that it was not commited by a Scorpio. Checking two suspects' IDs, he sees that one is a Scorpio and the other is a Capricorn. Should he suspect the Capricorn more?

Lance Turbo
01-03-2004, 11:03 PM
You see a black guy and a white guy. There are exactly 9,000,000 possibly black guy white guy pairs.

Of those 9 million happy couples:

6,840,000 are an honest white guy and an honest black guy
1,710,000 are an honest white guy and a dishonest black guy
360,000 are a dishonest white guy and an honest black guy
90,000 are a dishonest white guy and a dishonest black guy

I believe we are assuming that we have one honest man and one dishonest man, so we can throw out 6,930,000 cases of honest/honest and dishonest/dishonest.

That leaves 2.07 million cases of which about 82.6% contain a dishonest black man.

Chase the black dude.

KP
01-04-2004, 12:24 AM
Lance Turbo:

You absolutely CANNOT throw out the dishonest/dishonest cases, and get the correct numeric answer; those cases are intrinsic to the problem. However, thus far, it seems likely to me they may not affect your final decision under an algebraic model, except possibly under very extreme discrepancies in population size or prevalence. The question looks to be even trickier under discrete math model (criminals and citizens must come in integer units).

sailor said
but this has to be in lesson 1 of Probability 101. it is the simplest probability cae you can imagine, with just two variables.
Well, technically, Probability 101 does tell you that "Probability is only good for predicting the behavior of a large number of samples. It can't predict single incidents, and is poor at small sample sizes." Also, This is *not* a two variable problem. That is at the root of the original "paradox" you cited. It's at least a three variable problem, where the third variable must be calculated by using two of the variables you provided. (see below)

I do understand what you mean, of course, but Probability is always a matter of inexact knowledge. I've been trying to prove to myself whether the 'Probability 101' model is the best possible approximation, the conditions where it is weakest (if any), and whether it either over-assumes or under-uses all available data. I had assumed that was the primary thrust of the thread, since the resolution to your OP has already been given. However, re-reading the thread, it's clear that not everyone is debating the same issues.

I apologize if my focus has caused confusion (apparently it's confused me at least once!), and I'll concede there's more than a small measure of 'Devil's Advocate' in it (I was taught that critical analysis is essential) I don't do it to annoy or mislead - it's actually a fair amount of work!

To make up for that, here's the derivation of...

The "Probability 101" answer:
We have T<w> white candies and T<b> black candies. Some of each are milk chocolate (M) inside and some are dark chocolate (D).

For a randomly selected candy of color c, M<c>:D<c> = odds that it has milk chocolate. The probability P<c> = M<c>/T<c>.

The number of milk chocolates in each color M<c> = P<c>*T<c>

HOWEVER, since M<c>=P<c>*T<c>, M<w> can be greater than M<b> even if P<b> is greater P<w>, if and only if T<w>/T<b> is greater than P<b>/P<w> This is the resolution of the apparent paradox in the OP. (this is a 3-variable problem. You must know P<w>, P<b> and T<b>/T<w>)

Independence is not a trivial issue in problems like these. The "Monty Hall paradox" hinges on the issue of whether seemingly independent consecutive options are genuinely independent, and therefore whether they have equal probability.

If I draw a white candy and then a black candy from the bowl, they are independent events. Neither selection affects the other. The odds of each candy being milk chocolate are given by its respective ratio M<c>:D<c> in the bowl. P<c> can also be used.

If I draw two candies together, and then return them to the bowl, until I get a black and a white together, the two candies are still independent draws and the chances of each color being milk chocolate are still given by its respective P<c>.

HOWEVER, if I do the above until I have a black-white pair AND exactly one milk chocolate between them, the probability of the two colors being milk chocolate are no longer independent. in this case:

P<wb, md> = [P<w>*(1-P<b>)] + [P<b>*(1-P<w>)]

HOWEVER, this does NOT represent the crime case correctly. One of the parties commited the crime and must be a criminal, BUT the other party can be either a 'criminal' or 'honest' [i.e. his 'criminality' is immaterial] We can no longer use the marbles or candies that elementary probability texts so adore.

P = [P<w>*(1)] + [P<b>*(1)] [i]- [P<b>*P<w>]
because [P<w>*(1)] + [P<b>*(1)] double-counts the instances
where both men are criminals (once in P<w> and once in P<b>)

This tells us the probability of the situation, but it does not yet tell us exactly how to apportion the chances of guilt between the two suspect. Since the equation (and especially the third term) is symmetric with respect to P<w> and P<b>, we might think:

P<white guilty> = P<w> - [P<b>*P<w>]/2
P<black guilty> = P<b> - [P<b>*P<w>]/2
Note that the chances of black and white guilt are *not* independent.

I'm not 100% certain that this is the best possible answer, but it's definitely as far as you'd get in Probability 101. There are several potential issues that this simple derivation does not address [such as the surprising 'fractional black criminal' in my last post]. When you get away from simple models (e.g. by moving from continuous to discrete mathematics) interesting results often fall out of the cracks.

sailor
01-04-2004, 07:32 AM
Lance Turbo, I believe your analysis of counting possible pairs is correct but I do not think you should leave out bad-bad pairs so I would redo it like this:

You have 1,710,000 + 90,000 = 1,800,000 cases with a bad black guy.
You have 360,000 + 90,000 = 450,000 cases with a bad white guy
Therefore the probabilities are exactly 80% and 20% respectively. I believe this is the correct solution until someone can point out why it is wrong (and I am sure someone will come along shortly and do just that).

Lance Turbo
01-04-2004, 08:32 AM
Originally posted by KP
Lance Turbo:

You absolutely CANNOT throw out the dishonest/dishonest cases, and get the correct numeric answer;

KP, I think you can throw out the dishonest/dishonest cases for the same reason that you can throw out the honest/honest cases. In both those situations, it doesn't matter who you chase. When making a decision based on probabilities, there is no reason to look at cases in which your decision is irrelevant.

sailor
01-04-2004, 03:05 PM
Lance Turbo, no you can't do that. You have to take into account the bad+bad cases. We can disregard the good+good just because it is assumed in the definition of the problem I gave that there is a bad guy but the definition does not say there cannot be two bad guys. Those cases *do* affect the answer as I have alredy shown in my math in the previous post. I think your analysis is good except for that point.

Lance Turbo
01-04-2004, 03:54 PM
Originally posted by sailor
...because it is assumed in the definition of the problem I gave that there is a bad guy but the definition does not say there cannot be two bad guys.

The definition of the problem could have used a little work, but that is not really important right now. (An umambiguaously defined problem would be something like, "If at least one of the two men were involved in the crime, what is the chance that the police officer has apprehended a guilty man if he captured the white man.")

The question you asked was...

Originally posted by sailor
Who should the officer chase?

The answer is the black guy, and I'm pretty sure that that is no longer in dispute.

However, the question of whether or not to include bad/bad cases is still unanswered, but it doesn't matter if all you are trying to do is decide who to chase.

Without including bad/bad cases:

82.6% chase black
17.4% chase white

With including bad/bad cases:

79.2% chase black
16.7% chase white
4.2% it doesn't matter

In both scenarios it is clear you should chase the black guy. The important thing is that odds of success for chasing black are exactly 4.75 times greater than the odds for chasing white in both situations. No matter what percentage of cases involve bad/bad chase black is always 4.75 times more likely than chase white.

If the question was, "If at least one of the two men were involved in the crime, what is the chance that the police officer has apprehended a guilty man if he captured the white man." You would have to include the bad/bads to answer correctly 20.8%.

Also I should add that you calcualted your percentages incorrectly when you included the bad/bads. It should be 83.3% chase black and 20.8% chase white. They add up to more than 100% precisely because they both include cases in which your decision doesn't matter. (You included the 90000 bad/bads in your denominator twice.)

Your error should underline the unimportance of including the bad/bads to decide who to chase. Including them once leads to the answer - chase black. Leaving them out leads to the answer - chase black. Including them 1.5 times (like you sort of did) leads to the answer - chase black. Including them ten times leads to the answer - chase black.

sailor
01-04-2004, 05:28 PM
Originally posted by Lance Turbo
The definition of the problem could have used a little work, but that is not really important right now. (An umambiguaously defined problem would be something like, "If at least one of the two men were involved in the crime, what is the chance that the police officer has apprehended a guilty man if he captured the white man.") Ok, so let's refine & improve the question asked in the op and word is: "What are the probabilities that the black man did it and what are the probabilities that the white man did it?" While the original wording was not as clear I think it was understandable to people that was the question asked. Also I should add that you calcualted your percentages incorrectly when you included the bad/bads. It should be 83.3% chase black and 20.8% chase white. They add up to more than 100% precisely because they both include cases in which your decision doesn't matter. (You included the 90000 bad/bads in your denominator twice.) No, I believe you are wrong. The probabilities that the black man did it and the probabilities that the white man did it have to add up to one because we are certain of that. they cannot add to more than one which would be meaningless. You cannot have a higher degree of probability than 100 % Your error should underline the unimportance of including the bad/bads to decide who to chase. Including them once leads to the answer - chase black. Leaving them out leads to the answer - chase black. Including them 1.5 times (like you sort of did) leads to the answer - chase black. Including them ten times leads to the answer - chase black. I do not agree there is an error and I think anyone knowledgeable about probabilities will back me up. Anyone? (long silence ensues. . . )

Lance Turbo
01-04-2004, 08:34 PM
Originally posted by sailor
What are the probabilities that the black man did it and what are the probabilities that the white man did it?


1,710,000 cases have a good white guy and a bad black guy
360,000 cases have a bad white guy and a good black guy
90,000 cases have a bad white guy and a bad black guy

I think we agree on this.

Question 1: What are the probabilities that the black man did it?

A total of 1710000 + 90000 = 1800000 cases have a bad black man.
A total of 360000 cases have good black man.

There are 1710000 + 90000 + 360000 = 2160000 cases in all.

The black man is bad in 1800000/2160000 cases (about 83.3%)
The black man is good in 360000/2160000 cases (about 16.7%)

These add up to 1 (100%) because the black man is either good or bad. We are certain of this.

Question 2: What are the probabilities that the white man did it?

A total of 360000 + 90000 = 450000 cases have a bad white man.
A total of 1710000 cases have good white man.

There are 1710000 + 90000 + 360000 = 2160000 cases in all.

The white man is bad in 450000/2160000 cases (about 20.8%)
The white man is good in 1710000/2160000 cases (about 79.2%)

These add up to 1 (100%) because the white man is either good or bad. We are certain of this.

Lance Turbo
01-06-2004, 12:03 PM
I hate to bring this up again, but I am wondering if we have reached an agreement.

Omphaloskeptic
01-06-2004, 10:48 PM
Originally posted by Lance Turbo
I hate to bring this up again, but I am wondering if we have reached an agreement. Well, I agree with your analysis. But you probably noticed that my numbers of 19/23=82.6% and 4/23=17.4% (computed assuming that exactly one criminal is present) agree with your numbers for the same case, so that won't surprise you much.

sailor
01-07-2004, 08:49 PM
I still disagree with the result. Knowing only one of them did it, the probabilities that one and the other did it *have* to add to one. We agree that A total of 1710000 + 90000 = 1800000 cases have a bad black man.
A total of 360000 + 90000 = 450000 cases have a bad white man. Then it is obvious to me that P1 = 1800000 / (1800000+450000) = 80% and P2 = 450000 / (1800000+450000) = 20%

I am sticking to that answer as I find it totally obvious.

Omphaloskeptic
01-07-2004, 09:55 PM
Originally posted by sailor
I still disagree with the result. Knowing only one of them did it, the probabilities that one and the other did it *have* to add to one. We agree that
A total of 1710000 + 90000 = 1800000 cases have a bad black man.
A total of 360000 + 90000 = 450000 cases have a bad white man.
Then it is obvious to me that P1 = 1800000 / (1800000+450000) = 80% and P2 = 450000 / (1800000+450000) = 20%

I am sticking to that answer as I find it totally obvious. If you know that only one of them did it (I am interpreting this as "we are given that there was exactly one criminal at the scene"--if this is not what you mean, could you clarify?) then why are you including the 90000 cases where both the black man and the white man are criminals?

We have this table (these are the numbers originally presented by Lance Turbo) Black innocent Black criminal
White innocent 6840000 1710000
White criminal 360000 90000If you know that there is exactly one criminal at the scene, then the only two possible cases, of the four in this table, are the table's upper-right corner (white innocent; black criminal) and lower-left corner (white criminal; black innocent). The 90000 cases where both are criminal are not relevant, since they are not possible in this scenario; even if they were relevant they should not be double-counted as you are doing.

The relevant fractions are then the ones Lance Turbo first calculated:
&nbsp;&nbsp;P1 = 1710000/(1710000+360000) = 82.6% and P2 = 360000/(1710000+360000) = 17.4%.
Why do you find your answers more obvious than these?

Omphaloskeptic
01-07-2004, 10:52 PM
Originally posted by KP
The apparently contradictory finding of Example C is not caused by the HIGH 100% rate, but by the ULTRA-LOW rates: prevalence of women, and prevalence of driving males (small number of blacks, and almost total noncriminality of whites in Example C)
...
As you can see, the "criminality of blacks" is irrelevant in Oomphaloskeptic's Example C. Even when there is not one single black criminal, but 450 white criminals, the ultra-low white rate makes black men "the likeliest candidate". I don't know what "0.1" black criminal is, but it's way less than "one single criminal".

Apparently even "bad thoughts" by a black man should affect a cop's decision of whom to chase, more than 450 actual White criminals, if the "white crime rate" is low enough.This is misleading. The criminality of blacks is not irrelevant in my Example 3 (is this what you're calling Example C?). From your examples it may appear so, but this is only because you haven't lowered the "black crime rate" enough--even in your final case, with 0.1 black criminal, the black crime rate is higher than the white crime rate, and so it should not be surprising that the probability is greater than 50% that the black man is the criminal. If you had lowered it to 0.05 or 0.01, you would have found (going through the same math as in my original reply) that the officer should chase the white man: not surprising, since at that point the black crime rate would be less than the white crime rate.

Quantitatively:
&nbsp;With 0.1 black criminal, the black crime rate is 0.050% (higher than the white crime rate of 0.045%); the probability that the white guy is guilty and the black guy is innocent is 47.4%.
&nbsp;With 0.01 black criminal, the black crime rate is 0.005% (lower than the white crime rate of 0.045%); the probability that the white guy is guilty and the black guy is innocent is 90.0%.

The "bad thoughts" remark is just silly, and the remarks about fractional criminals are a red herring. You introduced the fractional criminals, so claiming that you don't know what 0.1 criminal means gets no sympathy from me. But a value of "0.1 criminal" makes sense, and doing the above calculations is still correct, if (for one example--there are other interpretations) the table shows the results of averaging data over the past 100 months. The table can be viewed, when properly rescaled, as a table of joint probabilities for two random variables (RACE and HONESTY); in this formulation there's no reason that the values in the table need to be integral.

This makes Example C so sensitive to black misdeeds, and so forgiving of the prospect of white misdeeds that it actually says you should arrest the black when there isn't a single black criminal, but there are hundreds of white criminals.This is, once again, only true as long as the black crime rate is higher than the white crime rate. I fail to see why this is so counterintuitive.

In fact, under Example C you could raise the White criminal rate to 100%, and the answer still wouldn't be "arrest the white man".Is this really so surprising? I mean, Example 3 was the case where all blacks were criminals. If all whites are criminals too, well, it doesn't much matter who we go after, does it? And, in fact, if you work through my original analysis in this case... that's what it will tell you too! (NB: if you actually make everyone a criminal the particular analysis above will actually produce ill-defined behavior; but if you make rates of black and white crime identical, for any crime rate between zero and one they will produce the same, completely unsurprising, result: that the two men are equally likely to be guilty.)

muttrox
01-08-2004, 06:32 AM
I'm glad to see that after skipping most of this thread, I got the same answer as Omph and Lance. Guess that statistics degree stuck.

Chances the black man is guilty = .25 (200/800)
Chances the white man is guilty = .052 (450/8,550)

.25/.052 = ~4.8, meaning the black man is 4.8 times more likely than the white man to be the guilty one. That works out to ~83% of the time, the black man is the guilty one. This is exactly the same reasoning as Lance and Omph used in a different form (and IMHO, much easier to understand). Note that all 4 numbers are used, so as we intuitively expect, changing any of the parameters will change the answer.

This avoids the whole question of bad/bads... and it seems to me the OP was essentially assuming that one and exactly one of the suspects was guilty.

Jurph
01-08-2004, 08:59 AM
If I'm the cop? I chase they guy I can't file a good description of later. Black guy with hair cut in a wild 1992 slanted box, with a bleached stripe in it, wearing bright blue Nike sneakers and an Iverson jersey? Let him run. I'm going after the 5'10" white guy with dirty blonde hair wearing jeans, beat-up sneakers, and a blank gray sweatshirt.

sailor
01-08-2004, 09:33 AM
Yes, it is assumed only one of them did it but, the other guy could be a bad guy by chance even if not involved. I am having great trouble accepting any of the conslusions presented here because they seem to lead to apparent contradictions.

muttrox
01-08-2004, 10:31 AM
Sailor, I don't see the contradiction (I've gone back about 20 posts). What is your exact objection? The only thing I see you state is that probabilities should add up to one, which is satisfied.

Omphaloskeptic
01-08-2004, 01:15 PM
Originally posted by sailor
Yes, it is assumed only one of them did it but, the other guy could be a bad guy by chance even if not involved.OK, this is a different assumption than I was using. I have been assuming that the officer knows that exactly one criminal (not zero, not two) is present at the scene. In your case, where at least one criminal is present, we can eliminate only the upper-left corner of the table Black innocent Black criminal
White innocent 6840000 1710000
White criminal 360000 90000This gives us (this is just reiterating Lance Turbo) a total of
&nbsp;&nbsp;&nbsp;&nbsp;1710000+360000+90000 = 2160000
possible pairings, in three categories:
&nbsp;&nbsp;&nbsp;&nbsp;1710000/2160000 = 79.2% have a white innocent and black criminal present;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;360000/2160000 = 16.7% have a black innocent and white criminal present;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;90000/2160000 = 4.2% have both a white criminal and black criminal present.
For the final 4.2% of cases, we don't have enough information to make an informed guess as to whom to chase.

Like muttrox, I don't know exactly what your objection is and so I'm not sure how to address it.

Lance Turbo
01-08-2004, 02:48 PM
Now we're getting somewhere.

Originally posted by sailor
Yes, it is assumed only one of them did it but, the other guy could be a bad guy by chance even if not involved.

This is not an assumption that has been made by anyone else in this thread, but now that you have asserted it, we can get to the definitive answer.

1710000 cases bad black, good white
360000 cases good black, bad white
90000 cases bad black, bad white

2160000 cases in all. Period. Not even one more.

Knowing that exactly one guy commited the crime allows us to divide the 90000 bad/bads further.

45000 guilty black guy, innocent bad white guy
45000 guilty white guy, innocent bad black guy

Why can we divide this group exactly in half. Because...

Originally posted by sailor
The bad guys are about equally bad...

So now we can compute the odds

(1710000 + 45000)/2160000 = 13/16 = 81.25% black guy did it.
(360000 + 45000)/2160000 = 3/16 = 18.75% white guy did it.

These add up to unity, so it should be obvious that it is correct.

sailor, your 80% 20% results require the use of 2250000 for your denominator. This number is 1710000 + 360000 + 90000 + 90000. That second 90000 is just wrong. No other way to look at it.

A total of 1710000 + 90000 = 1800000 cases have a bad black man.
A total of 360000 + 90000 = 450000 cases have a bad white man.

Are two statements that have been agreed upon. However not every case with a bad black man has a guilty black man, and not every case with a bad white man has a guilty white man.

sailor
01-08-2004, 04:47 PM
Ok, let me ask you this. Problem #1 Problem #2 Problem #3

Cases where
White is bad 2*10^8 20 p

Cases where
black is bad 8*10^8 80 q

Cases where it
could be either 100 10*10^8 r
with equal probab.

I say answer is
Probabilty should
be close to 20/80 50/50 I say the answer in each case is

P1 = (p+r) / (p+q+2r)
P2 = (q+r) / (p+q+2r)

which makes sense to me because those r cases have both bad buys and are counted on both sides.
Forget about specific numbers. What are your answers for P1 and for P2 as a function of p, q and r in problem #3 above?

Lance Turbo
01-08-2004, 05:09 PM
P1 = (p + r/2)/(p + q + r)
P2 = (q + r/2)/(p + q + r)

The total number of cases is clearly p + q + r.

Why on earth would the total number of cases be p + q + 2r?

sailor
01-08-2004, 05:31 PM
You are probably right. Not certainly right but probably right. :)

Lance Turbo
01-08-2004, 05:38 PM
Probably right is good enough for me on a probability problem. ;)

Omphaloskeptic
01-08-2004, 06:03 PM
sailor, consider the case where p=r and q=0 (i.e., there are an equal number of cases where white is definitely bad and where either white or black is bad with equal probability; black is never definitely bad). In this case white is bad 3/4 of the time (half the time from the definite cases, and half of the other half of the cases). Your formula gives 2/3 instead.