Statistics & profiling problem

Then again, the problems raised so far can also be used to understand The Problems With Profiling.

The guys could be running from the scene because they are afraid of being wrongly harassed by the police. And in fact we would expect this to be more of a factor for the member of the group that profiling suggests we should harass. So, the person that profiling says we should harass is running because he knows he’s going to get blamed, and the guy that profiling says we shouldn’t harass is runnig because he’s guilty. So the game theory approach to the problem says we should catch the guy that we don’t think we should catch.

And suddenly the reasoning starts reminding me way too much of The Princess Bride

Luis Tiant, a Cuban born pitcher for the Red Sox, was stopped for jogging in the predominantly white neighborhood in which he lived. The cop’s reasoning was that a black man running in that vicinity must be fleeing from something.

Please! This is not a discussion about racial profiling so I would appreciate it if those wishing to discuss racial profiling would take their discussion to another thread. My question is 100 percent a probability problem and the relevant data is given in my OP. The question is: given the information the cop has, what can he infer with regards to who is the most likely suspect? That is the question. The fact that in the real world there is no town named X, or other similar real-world considerations, are totally outside the problem I am presenting. I am interested in learning about statistics, not about the racial problems of America.

>A crime has been committed and an officer arrives on the scene and sees a white guy and a black guy leaving the scene in opposite directions. He cannot go after both so he has to choose which one to go after.
>This is not a discussion about racial profiling

sailor, you strain me!

Q. Delta flight #45678 leaves Washington DC and flies towards its destination located 6000 miles away at a speed of 2000 mph. How long will it take to get there?

A. There is no Delta flight #456789
There is no airport located in the District of Columbia
No commercial airliner can fly at 2000 mph

You think that would be considered the correct answer in most schools and colleges? :rolleyes:

It seems you cannot mention race, guns, abortion, religion and a number of other hot issues, no matter how in passing, without the thread being hijacked to hell by people who want to argue their pet subjects. Please take it to another thread. This is about mathematical probabilities.

Common sense math answer that will surely infuriate mathematicians:

There’s a 9% chance that of the two men one will be white and one will be black.

So we now know that this combination is rare in this town.

Calculating the probability that the Black dude will be a criminal gives a 1.8% possibility looking at the entire population; or a 2.9% possibility looking at the criminal population.

Calculating the probability that the White dude will be a criminal gives a 4.5% possibility looking at the entire population. It gives a 5.4% possibility looking at the population of criminals.

Thus about 5-2 odds that chasing the White dude will get the cop a collar, given no other information.

ALL RIGHT, ALL RIGHT STOP IT !

IT WAS ME, OK ?

I admit it.

hroeder, you have thrown in a new perspective which had not occurred to me and which makes sense but I think you may have the numbers wrong. Let me do some math and see what I get.

A) The white guy did it. Then the black guy can be any of the 1000 black guys in town, good or bad.

B) The black guy did it. Then the white guy can be any of the 9000 white guys in town, good or bad.

So, we have 9000 cases in which the black guy did it versus 1000 cases in which the white guy did it. The reasoning seems correct to me and yet the conclusion seems wrong in that the process does not even take into account the numbers of bad guys in each group. I assume the process is wrong but I cannot put my finger on it. Someone?

I still can’t put my finger on it but it seems more reasonable to use strictly the numbers of bad guys in which case the probability is 70% that the white guy did it. I feel quite confident with that.

At least I suppose that is a valid first analysis. Then I suppose it could be refined but I still doubt the result got by Omphaloskeptic that the probability is only 17% that the white guy did it. I think there has to be some error along the way.

The answer may be neither number but right now, if I had to pick between 17% 1nd 70% I would pick 70% that it was the white guy. I would also bet that the correct answer is not higher than 70% but not as low as 17% .

Hroeder. I largely agree, but with a significant distinction.

There are 450 white criminals and 200 black criminals. With no data on any differences in extent of the criminal records of black vs. white criminals, we can only consider them equally “criminal”, making the odds 450:200 (69.23%) that any crime is commited by a white.

As I said earlier (but apparently, not well): the number or race of innocent people is irrelevant. By the assumptions of the problem, if the entire town were present at a speech and the speaker was assassinated, you’d have 650 suspects, not 10,000. If 1000 (honest) FBI agents were present, there are still 650 suspects, not 11,000. If the town’s 1000 Quakers, black and white, are never criminals, and didn’t attend, there are still 650 suspects, not 9000. If “all criminals are equal”, then there’s a 69.23% chance the assassin is white, even if everyone flees the scene in fear.

If ONLY criminals commit crimes, only criminals are relevant. It’s always possible to calculate the criminal %age for any group (e.g. by last digit of SSN), but that merely compares the irrelevant innocents to the relevant criminals.

The population of China is also innocent of this crime, should we use the ratio of black criminals to native Chinese to decide between the black and white suspect? The fact is: the ratios of black and white criminals to native Chinese actually gives the right answer! Why, because we’re using the same denominator to compare both sets of criminals. “Innocents of the same race” or “total population of same race” are inappropriate, skewing denominators.

Yes, I think KP’s analysis is right. At least initially the odds are 450/200 that a white guy did it and that is the conclusion if the officer arrives at the scene and sees no one.
The question is whether seeing a white guy and a black guy adds any relevant information which would alter the odds and that is where I am not clear but right now I can’t see how it does.

If it did it would lower the figure somewhat but I can’t see hou it could lower it to anything close to 17%, even 50% would seem a stretch.

OK, let’s change the parameters and see if the conclusion (that the odds are 450:200) makes sense.

Here are the original statistics (corrected):


        Honest    Bad   Total 
White:    8550    450    9000
Black:     800    200    1000
Totals:   9350    650   10000

Now let’s consider an extreme case (Example 2 for reference):


        Honest    Bad   Total 
White:    9350    450    9800
Black:       0    200     200
Totals:   9350    650   10000

Notice that I have not changed the “Bad” column at all. I’ve only changed the 800 honest blacks to whites; in our new fictional town of East X there are no honest blacks. Is it still the case that the odds are 450:200 that a white guy did it, even though the black guy is a guaranteed criminal?

I can make this even more extreme by adding honest whites (Example 3):


        Honest    Bad   Total 
White:  999350    450  999800
Black:       0    200     200
Totals: 999350    650 1000000

In the bustling metropolis of Lower East X, with a white criminal proportion of 0.045% and a black criminal proportion of 100%, there are still 450 bad whites and 200 bad blacks. Is it still 450:200 for the white guy?

I’m trying to come up with a more intuitive example, but (to me at least) these two extreme cases make the 450:200 odds seem very unreasonable. Let me try another explanation of the new information: You have two populations (whites and blacks). The relevant feature of these populations is that the smaller population has a higher proportion of criminals. You see two suspects at the scene, and (by the tacit assumption in the problem) exactly one is guilty. The fact that exactly one of the suspects from the smaller, higher-crime population is at the scene is relevant because it is a relatively unlikely occurrence.

The reason that the innocent populations are not irrelevant here is that there are two people at the scene, one guilty and the other innocent. This is statistically more likely if the guilty man is from a more-guilty population, but also if the innocent man is from a more-innocent population.

(The innocent population of China, etc., is irrelevant to the question not because of its innocence but because it is not part of the problem universe. Charles Manson is also irrelevant to the question for the same reason, even though he’s guilty.)

I have to admit I am totally perplexed by what seems such a simple probability problem. Several analysis all seem correct to me and yet they cannot be all correct as they lead to contradictory results. Then I can find fault with all. As i say, i am perplexed.

Oomphaloskeptic makes a valid point. I’m not sure I agree with his explanation, but his numeric argument is solid.

I’ve played with some numbers, and it appears that there is some effect beyond the population size effect [Effect #1 in my original post] that seems to be nonlinear - perhaps a factor of x/(1-x) - "the ratio between empty space and filled space"in a fixed size container. Such equations behave differently when compared quantities are near each other [“in the same regime”] vs. when they are far apart.

I overlooked this factor in the original example where both quantities were in the same regime. I’ll need to work out a single unified equation to understand which regimes various factors predominate. I’ll report back then.

But again: Oomphaloskeptic is right: I was wrong to the number of innocents is completely irrelevant. (we all know what heppens when people If I ignore, bury or forget their kistakes, so I try to emphasize mine.)

I don’t know Bayesian math, but it seems to me that whatever solution is correct must fulfill the following criteria, if B is the fraction of blacks which are bad and W is the fraction of whites which are bad, regardless of the overall populations:
[ol][li]If B = W, then color and honesty are independent variables, and so it’s exactly 50/50.[/li][li]If B = 0, then it’s 100% certain that the cop should pursue the white one, and if W = 0, then it’s 100% certain that the cop should pursue the black one.[/li][li]If B = 1, then it’s 100% certain that the cop should pursue the black one, etc.[/li][li]If B = W = 1 or B = W = 0, then the best choice is undefined.[/li][/ol]Does anyone disagree with these? Because some of the solutions so far contradict them.

Achernar, in simple terms what you are saying is that the cop look at the probabilities that each individual is bad. Your assertion makes sense from a certain point of view and this is the problem I am having: different POV yield different answers. Look:

The cop arrives and thinks of the crime. Clearly the probability that it was committed by a white guy is 70% because 70 % of crimes in the city are being committed by white guys. So he should chase mr White. The logic seems irrefutable.

But now he forgets about the crime when he sees the two guys and thinks (as you point out) the probability that a black individual is a bad guy is four times higher than the probability that a white guy is bad. So he should chase Mr Black.

Both reasonings look sound to me and yet they can’t both be true because they are contradictory.

Probably the correct answer to the probability of the white guy being who did it lies somewhere between the upper 70% and the lower 20% which each scenario yields. But that is still a huge range and I do not know what logical process would combine both factors to give the correct answer.

Case 4 is contained in case 1, isn’t it?

Well, as I wrote it, case 4 contradicts specific cases of 1, 2, and 3. But I meant for it to supercede them.