The Straight Dope

Go Back   Straight Dope Message Board > Main > General Questions

Reply
 
Thread Tools Display Modes
  #1  
Old 01-01-2004, 10:12 AM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
Statistics & profiling problem

Reading a thread going on in GD this question occurred to me.

The town of X has a population of 10,000 of which 10% black and 90% white.
Of blacks, 20% are criminals and 80% are honest while of whites 5% are criminals and 95% are honest.
Code:
        Honest    Bad   Total 
Whites:   8550    450    9000 
Black:     800    200    1000 
Totals:   9450    650   10000
The bad guys are about equally bad so that about 70% of crimes are committed by white bad guys and 30% by black bad guys.

A crime has been committed and an officer arrives on the scene and sees a white guy and a black guy leaving the scene in opposite directions. He cannot go after both so he has to choose which one to go after.

Hmmm, he thinks, 70% of crimes are committed by white guys which is more than twice the percentage of crimes being committed by black guys. I should therefore go after the white guy.

Oh, but wait, taken individually the white guy has a probability of 5% of being a criminal but the black guy has a probability of 20% which is 4 times higher. I should therefore go after the black guy.

What is the true, correct and definitive answer? Who should the officer chase?
__________________
Posted using 100% recycled electrons.
Reply With Quote
Advertisements  
  #2  
Old 01-01-2004, 10:18 AM
don't ask don't ask is offline
Member
 
Join Date: May 2001
Location: Sydney, Australia
Posts: 14,944
He should wound one in the leg and then chase the other.
Reply With Quote
  #3  
Old 01-01-2004, 11:47 AM
ultrafilter ultrafilter is offline
Guest
 
Join Date: May 2001
This looks like a problem that can be addressed by Bayes' theorem, but I'm not 100% sure how to apply it here.
Reply With Quote
  #4  
Old 01-01-2004, 01:38 PM
Orbifold Orbifold is offline
Guest
 
Join Date: Oct 2000
First the math, then the explanation of why the math is probably not applicable anyway.

So: mathematically, assuming that the black man is randomly selected from the set of all black people in the town of X, then the probability that he is a "bad guy" is 20% as you've stated. So in that sense, the black guy is more likely to be a "bad guy" than the white guy (assuming that the white guy is a randomly selected white guy as well). The fact that there are more white bad guys than black ones means that if you select a bad guy at random then he's more likely to be white than black, but that's not what's happening in the situation you describe.

But of course, these people aren't randomly selected are they? It's not as if the census bureau picked one white man and one black man at random and dropped them at the scene. So statistical conclusions that assume the people are randomly selected, like the one in the previous paragraph, are of questionable merit at best. For all the cop knows there's a secret white bad guy convention going on down the street.

The cop really has no information to suggest which suspect is more likely to be a criminal, without making an unfounded statistical assumption.
Reply With Quote
  #5  
Old 01-01-2004, 03:20 PM
robcaro robcaro is offline
Guest
 
Join Date: Nov 2001
Well, while the cop is doing the math, both of them will get away. I suggest that we should agree with don't ask. Too bad that we have a racial system in the US. Cops shouldn't be asked to judge between black or white.
__________________
A committee is a thing which takes a week to do what one good man can do in an hour. ~Elbert Hubbard
Reply With Quote
  #6  
Old 01-01-2004, 03:26 PM
Napier Napier is offline
Charter Member
 
Join Date: Jan 2001
Location: Mid Atlantic, USA
Posts: 7,218
>Hmmm, he thinks, 70% of crimes are committed by white guys which is more than twice the percentage of crimes being committed by black guys. I should therefore go after the white guy.

I think this is a red herring. It would have influenced how likely it would be that one white and one black are leaving the scene, but that's a given and so the statistic is not useful here.

>Oh, but wait, taken individually the white guy has a probability of 5% of being a criminal but the black guy has a probability of 20% which is 4 times higher. I should therefore go after the black guy.

I think this is the relevant statistic. It's also the correct conclusion, if we're given that the cop should take advantage of the statistical information and also that he has no other criteria to use. These aren't necessarily trivial points.

>The cop really has no information to suggest which suspect is more likely to be a criminal, without making an unfounded statistical assumption.

I think this isn't correct, in two ways. The statistical information is certainly "information to suggest which suspect is more likely to be a criminal", though it is certainly weak information that still leaves the criminal's identity quite uncertain. Weak information is certainly more useful than none at all. And, I don't hear anything about assumptions here, founded or not.

BTW, sailor, you're not afraid to ask the tough questions, are you? Perhaps we should substitute "army men" for "white men" and "navy men" for "black men". After all, why offend anybody needlessly?
Reply With Quote
  #7  
Old 01-01-2004, 03:54 PM
Omphaloskeptic Omphaloskeptic is offline
Guest
 
Join Date: Oct 2001
It's not quite right to suggest that just because you don't know everything you can't know anything. In the absence of complete information you can only make guesses, of course, and these guesses will sometimes be wrong; but this is a far cry from having no information at all. Using probability theory is one way of trying to make better guesses: i.e. guesses which are, statistically, more likely to be correct. Of course this relies on various assumptions (e.g. statistical independence between various random variables), and if your assumptions are badly wrong then you might make worse guesses instead.

A Bayesian might analyze the situation as follows:

Initially the officer sees two men, X and Y, running from the scene, and assigns each of them equal prior probabilities of being the criminal:
    P(Y,¬X) = 1/2        (prior probability that Y is criminal and X is not)
    P(X,¬Y) = 1/2        (prior probability that X is criminal and Y is not)
As he approaches, he sees further that X is black (Xb) and Y is white (Yw). He now updates his priors with the new information, using Bayes' Law (linked by ultrafilter above):
    P(Y,¬X | Xb,Yw) = P(Xb,Yw | Y,¬X) P(Y,¬X) / P(Xb,Yw)
    P(X,¬Y | Xb,Yw) = P(Xb,Yw | X,¬Y) P(X,¬Y) / P(Xb,Yw)
The denominator (basically a normalization factor) is computed by summing over all possibilities:
    P(Xb,Yw) = P(Xb,Yw | Y,¬X) P(Y,¬X) + P(Xb,Yw | X,¬Y) P(X,¬Y) .

Now how can we compute P(Xb,Yw | Y,¬X) (the probability that X is black and Y is white, given that Y is criminal and X is not)? We might assume (in the absence of more comprehensive statistical information) that the probability that X is black does not depend on whether Y is criminal, i.e. that the actions of X and Y are basically independent of each other. In this case we can write
    P(Xb,Yw | Y,¬X) = P(Xb | ¬X) P(Yw | Y) .
Now P(Xb | ¬X) (the probability that X is black, given that he is not a criminal) and P(Yw | Y) (the probability that Y is white, given that he is a criminal) are given in the statistical tables provided:
    P(Xb | ¬X) = 800/9350 = 16/187 [note typo "9450" in table]
    P(Yw |   Y) = 450/650   = 9/13
So
    P(Xb,Yw | Y,¬X) =   (16/187) (9/13) = 144/2431
and similarly
    P(Xb,Yw | X,¬Y) = (171/187) (4/13) = 684/2431
so
    P(Xb,Yw) = (144/2431)(1/2) + (684/2431)(1/2) = 414/2431
and the updated posterior probabilities are

    P(Y,¬X | Xb,Yw) = (144/2431)(1/2) / (414/2431) =   72/414 =  4/23        (posterior probability that Y is criminal and X is not)
    P(X,¬Y | Xb,Yw) = (684/2431)(1/2) / (414/2431) = 342/414 = 19/23        (posterior probability that X is criminal and Y is not)

This approach, of updating a priori probabilities to reflect new information using Bayes' Law, is called Bayesian inference. It's an extremely useful statistical technique, though (as with all statistical techniques) it relies on having valid data and assumptions.

Depending on how you define "better," it may be appropriate to consider factors besides the probabilistic results; some of these other factors come into arguments against profiling. (The game-theoretic aspects of policing, for example, mean that the actions the officer takes in this round may affect the approaches taken by the parties in future rounds.)
Reply With Quote
  #8  
Old 01-01-2004, 04:59 PM
Orbifold Orbifold is offline
Guest
 
Join Date: Oct 2000
Quote:
Originally posted by Napier
And, I don't hear anything about assumptions here, founded or not.
The assumption I was referring to was the statistical assumption that the men at the scene were randomly selected from the set of all such men in the city. It's unstated, but it's not possible to compute probabilities (such as Omphaloskeptic has done) without making some such assumption, as Omphaloskeptic has correctly noted.
Reply With Quote
  #9  
Old 01-01-2004, 05:31 PM
Omphaloskeptic Omphaloskeptic is offline
Guest
 
Join Date: Oct 2001
Quote:
Originally posted by Orbifold
The assumption I was referring to was the statistical assumption that the men at the scene were randomly selected from the set of all such men in the city. It's unstated, but it's not possible to compute probabilities (such as Omphaloskeptic has done) without making some such assumption, as Omphaloskeptic has correctly noted.
Well, yes, but I think your post was unnecessarily pessimistic with
Quote:
The cop really has no information to suggest which suspect is more likely to be a criminal, without making an unfounded statistical assumption.
Of course all assumptions are mathematically "unfounded"—that's why they're assumptions; but some assumptions are more reasonable than others. If we require that the officer just throw up his hands in defeat, or maybe flip a coin, unless he can determine with certainty which one is the criminal, then we may as well disband the police forces. Even if he captured both suspects (like don't ask suggests) and questions them both, and one admits guilt and the other protests his innocence, ... well, maybe they're both good liars. He still has "no information" unless he makes the (unfounded) assumption that they don't have some bizarre reason to conspire to fool him. I think it's reasonable to consider this last scenario rather unlikely, though, and more generally, reasonable to make inferences based on statistical information. Mathematical certainty is never going to be possible here; all you can hope to do is use all the information you have and try to quantify your errors.
Reply With Quote
  #10  
Old 01-01-2004, 06:44 PM
drachillix drachillix is offline
Member
 
Join Date: Jun 2000
Location: 192.168.0.1
Posts: 8,376
Quote:
Originally posted by don't ask
He should wound one in the leg and then chase the other.
So which one do you shoot in the leg?
Reply With Quote
  #11  
Old 01-01-2004, 07:01 PM
ltfire ltfire is offline
Charter Member
 
Join Date: Dec 2002
Location: E 161 St. and River Ave.
Posts: 1,757
Quote:
Originally posted by drachillix
So which one do you shoot in the leg?
Well. the black one, of course. It's well known that they are bred to run faster.
Reply With Quote
  #12  
Old 01-01-2004, 07:39 PM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
Omphaloskeptic, the more I think about it the more confused I am and I arrive at contradictory results but none are what you say.

One analysis: The cop arrives on the scene and sees a crime was committed. He sees no one and thinks correctly that the chances are 70% that it was a white bad guy. Now he sees the black and the white guy. This yields no new information so he has better chances of catching the criminal if he goes after the white guy. Right now I think this is correct.
Reply With Quote
  #13  
Old 01-01-2004, 08:24 PM
Omphaloskeptic Omphaloskeptic is offline
Guest
 
Join Date: Oct 2001
Quote:
Originally posted by sailor
Omphaloskeptic, the more I think about it the more confused I am and I arrive at contradictory results but none are what you say.

One analysis: The cop arrives on the scene and sees a crime was committed. He sees no one and thinks correctly that the chances are 70% that it was a white bad guy. Now he sees the black and the white guy. This yields no new information so he has better chances of catching the criminal if he goes after the white guy. Right now I think this is correct.
No, the presence of these two people at the scene does in fact yield new information: that they are the most likely suspects. Before you saw these two, your list of suspects was the 10000 people in the city, each with probability 0.01%. Now it's (simplifying by pretending these are the only two suspects left) just these two, each with (for now ignoring the information about their race) 50% probability.

Beforehand, your suspicion that it was probably a white guy was weighted by the fraction of white suspects (large relative to the fraction of black suspects). But since they are no longer suspects, they (along with the rest of the 9998 other people in town) don't weight the results any longer. What's important now is how likely each of these two individuals is to be a criminal.
Reply With Quote
  #14  
Old 01-01-2004, 08:37 PM
drachillix drachillix is offline
Member
 
Join Date: Jun 2000
Location: 192.168.0.1
Posts: 8,376
Quote:
Originally posted by sailor
One analysis: The cop arrives on the scene and sees a crime was committed. He sees no one and thinks correctly that the chances are 70% that it was a white bad guy. Now he sees the black and the white guy. This yields no new information so he has better chances of catching the criminal if he goes after the white guy. Right now I think this is correct.
Or another analysis which if I understand Omphaloskeptic's post correctly.

With no other information other than a crime scene, you would assume it was 70% likely a white suspect because whites commit 70% of the crimes. However since the presence of a black person is noted from among the two individuals leaving the scene and by your stats any given black guy is about 4 times more likely to commit a crime as any given white guy.

This seems to correlate with the results of 4/23 chance of white perp vs 19/23 chance of black perp.
Reply With Quote
  #15  
Old 01-01-2004, 09:02 PM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
I'm going to have to think about that. I say this because I remember stumbling over another probability problem in another thread long time ago. It was the Monty Hall 3 door problem. I argued adamantly and suddenly, a little light turned on and I realized I was wrong. But it took quite some time and effort. This time I am going to be more prudent, especially because I am even less sure this time. Let me give it some thought.
Reply With Quote
  #16  
Old 01-01-2004, 09:19 PM
KP KP is offline
Guest
 
Join Date: Sep 1999
Actually, there are a whole passel of assumptions here. Please indulge me as I toy with some of them. If you just want the "meat" [genuine objection] skip to the bolded "Final Analysis".

ISSUE #1: you presume "Criminals" are always guilty, and "Honest men" never are - all criminals are born guilty, and all honest men are forever sainted. This is necessarily false, and weakens the derivation above. In real life, of course, all men are born innocent, and some become guilty at some point. Without knowing the rate of (convicted) first offenses - the rate at which honest men become criminals, we can only guess (or approximate) the actual probabilities

ISSUE #2: what do real cops think if they see two men running away from the scene of a crime? They think both are guilty! I'll get back to this point

ISSUE #3: If all guilty men are "always guilty" then we get some truly funky situations. If a penny is stolen from the till, every criminal in the vicinity must be guilty - a neat trick, and a ludicrous assumption for the real world.

You may argue that the unstated principle ('assumption' is more like it) that only one is guilty is implicit in the formulation of this problem (and similar problems of its class) But that forces a revision of the math.

You may say "the crime was murder, and only one gunshot was heard" (we'll ignore the crime of 'fleeing the scene'- which only casts more doubt on the presumption that "only criminals commit crimes") We still have to refine the derivation.

If both men were chosen at random, then a Real Cop's initial assumption (both are criminals for fleeing the scene) is correct 1% of the time, and the "reader's assumption" (there is only one criminal) is correct 23% of the time - but 76% of the time there'd be no victim at all!

The apparently relevant denominator ambiguous. it's either
A) "all cases where at least one man is a criminal" (which I think is the only physically arguable case); or
b) "all cases where only one man is a criminal" (whose only merit is conforming to a common presumption)

Pa = 450/9000 + 200/1000 - [(450/9000) * (200/1000)] = 24%
(the subtracted term removes the 'double-counted' overlap)
Pb = 450/9000 + 200/1000 - 2*[(450/9000) * (200/1000)] = 23%
(the subtracted term removes *both* counts of the overlap)

In either case, we can only rely on the prevalence of "guilty" men in each racial subgroup 10% (B) vs 5% (W) - but the analysis is completely flawed, because in a random matching of candidates the actual murderer would be White roughly 450/650 of the time. Why were we wrong?


FINAL ANALYSIS
--------------------
The flaw is: crimes are committed solely by criminals. Any number which includes innocent people merely obfuscates the issue. This includes statistics like "percentage of criminals" or "total population", which are affected by the number of innocents. changing the number of innocents does not affect the probability that a man is guilty.

The statistically valid denominator is the number of potential murderers in town
. All the Chinese in China, or all the innocent Chinese in town are irrelevant.

Issue 4: the problem was set up to demand a black man and a white man at the scene.The universe of random pairings, howeverdoes not reflect the underlying events each black man, guilty or innocent, is "forced to flee" 9-10.6875 times as many hypothetical crime scenes as each white man. This dramatically overestimates the possibility of black guilt

EFFECT 1: In the universe of of cases where a black man is guilty, the analysis provided pairs him against 9000 white men [case A] or 8550 innocent white men [case B] while each white man is mathematically paired against only 1000 or 800. If you wrote a chart every "random pairing, each black man's name would appear either 9000 or 8550 times, while each white man's name would only appear on the chart 1000 or 800 times. Throw a dart at this chart, and the result isn't "fair" - the black men have 9000/1000 or 8550/800 times as many slips in the hat.

To illustrate this,make the numbers more extreme. Say that there are only 10 blacks in the city (and 1 black criminal) while there are 10 million whites (and 50,000 white criminals). By (improper) Bayesian analysis, that black man must commit virtually all the crime that occurs in his vicinity, while the 50,000 white criminals sit on their hands. With a year (before new stats can be issued, every black person would be shot many times, and no white man would be caught in this situation.

EFFECT 2: Relying on prevalence in subpopulations will skew all future statistics, even if the population sizes are equal. In cases where men of both races are suspected, the black criminals will always be caught [and be counted], and the white criminals will always escape [and not be counted]. This effect is strongest as the population with the highest pre-existing

Effect 1 increases as the black fraction of the total population decreases. Effect 2 increases as the black fraction increases. Racial profiling is a Big Lose for blacks, guilty or innocent.
Reply With Quote
  #17  
Old 01-01-2004, 09:59 PM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
Well, at least the rest of us understand the problem as it was enunciated even if we're still having problems finding or understanding the solution.
Reply With Quote
  #18  
Old 01-01-2004, 10:07 PM
Omphaloskeptic Omphaloskeptic is offline
Guest
 
Join Date: Oct 2001
Quote:
Originally posted by KP
EFFECT 2: Relying on prevalence in subpopulations will skew all future statistics, even if the population sizes are equal. In cases where men of both races are suspected, the black criminals will always be caught [and be counted], and the white criminals will always escape [and not be counted].
This is true, and it is undeniably a problem with profiling schemes. It would be nice to have an equal chance of catching all criminals (whether black or white) and to prevent white criminals from gaming the system by committing their crimes when blacks are around (thus hassling innocent blacks), but it would also be nice to maximize our chances of catching criminals. These two desires are in conflict here, and there's no simple resolution; they can't simultaneously be maximized for the problem stated.

There's a reason my first response stopped where it did (with the computation of the posterior probabilities) and not with a complete answer to the OP's question
Quote:
Originally posted by sailor
What is the true, correct and definitive answer?
which is, as you point out (I only mentioned it in passing) a much more difficult question, and not one with a GQ answer. What do you want to maximize?

Quote:
Originally posted by KP
Racial profiling is a Big Lose for blacks, guilty or innocent.
Well, yes, but black crime is (in the real world; the statistics in the OP don't cover victims) also a Big Lose for blacks, guilty or innocent. It's not clear, to me at least, which causes worse problems in practice.
Reply With Quote
  #19  
Old 01-01-2004, 10:12 PM
KP KP is offline
Guest
 
Join Date: Sep 1999
My point is that the problem, as it is enunciated, is flawed.

The "understanding" you cite is what creates the confusion. My final analysis points out why.

Sorry about the rest of the stuff, if it didn't interest you. I probably should have taken the racial example less literally, However, picking assumptions apart is not irrelevant. It is an essential first step in mathematical analysis. If this were a more strictly mathematical forum, the assumptions would have been picked apart a lot more already. That's just part of how the game of math (vs. arithmetic) is played
Reply With Quote
  #20  
Old 01-01-2004, 10:41 PM
Omphaloskeptic Omphaloskeptic is offline
Guest
 
Join Date: Oct 2001
Quote:
Originally posted by KP
My point is that the problem, as it is enunciated, is flawed.

The "understanding" you cite is what creates the confusion. My final analysis points out why.

Sorry about the rest of the stuff, if it didn't interest you. I probably should have taken the racial example less literally, However, picking assumptions apart is not irrelevant. It is an essential first step in mathematical analysis. If this were a more strictly mathematical forum, the assumptions would have been picked apart a lot more already. That's just part of how the game of math (vs. arithmetic) is played
I don't think the problem is necessarily flawed. If you wanted to use this Bayesian analysis as a Rigorous Proof of The Efficacy And Rightness of Profiling, well, that would be a problem, yes. But the question, at least as I understood it, was somewhat more limited in intent than that: just a tool for understanding the mathematical reasons behind profiling.
Reply With Quote
  #21  
Old 01-02-2004, 12:06 AM
viking viking is offline
Guest
 
Join Date: Dec 2002
Then again, the problems raised so far can also be used to understand The Problems With Profiling.

The guys could be running from the scene because they are afraid of being wrongly harassed by the police. And in fact we would expect this to be more of a factor for the member of the group that profiling suggests we should harass. So, the person that profiling says we should harass is running because he knows he's going to get blamed, and the guy that profiling says we shouldn't harass is runnig because he's guilty. So the game theory approach to the problem says we should catch the guy that we don't think we should catch.

And suddenly the reasoning starts reminding me way too much of The Princess Bride
Reply With Quote
  #22  
Old 01-02-2004, 12:26 AM
DanBlather DanBlather is offline
Guest
 
Join Date: Jul 2001
Luis Tiant, a Cuban born pitcher for the Red Sox, was stopped for jogging in the predominantly white neighborhood in which he lived. The cop's reasoning was that a black man running in that vicinity must be fleeing from something.
Reply With Quote
  #23  
Old 01-02-2004, 08:33 AM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
Please take your hijacks to GD

Please! This is not a discussion about racial profiling so I would appreciate it if those wishing to discuss racial profiling would take their discussion to another thread. My question is 100 percent a probability problem and the relevant data is given in my OP. The question is: given the information the cop has, what can he infer with regards to who is the most likely suspect? That is the question. The fact that in the real world there is no town named X, or other similar real-world considerations, are totally outside the problem I am presenting. I am interested in learning about statistics, not about the racial problems of America.
Reply With Quote
  #24  
Old 01-02-2004, 09:21 AM
Napier Napier is offline
Charter Member
 
Join Date: Jan 2001
Location: Mid Atlantic, USA
Posts: 7,218
>A crime has been committed and an officer arrives on the scene and sees a white guy and a black guy leaving the scene in opposite directions. He cannot go after both so he has to choose which one to go after.


>This is not a discussion about racial profiling

sailor, you strain me!
Reply With Quote
  #25  
Old 01-02-2004, 09:54 AM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
Q. Delta flight #45678 leaves Washington DC and flies towards its destination located 6000 miles away at a speed of 2000 mph. How long will it take to get there?

A. There is no Delta flight #456789
There is no airport located in the District of Columbia
No commercial airliner can fly at 2000 mph

You think that would be considered the correct answer in most schools and colleges?

It seems you cannot mention race, guns, abortion, religion and a number of other hot issues, no matter how in passing, without the thread being hijacked to hell by people who want to argue their pet subjects. Please take it to another thread. This is about mathematical probabilities.
Reply With Quote
  #26  
Old 01-02-2004, 10:37 AM
hroeder hroeder is offline
Guest
 
Join Date: Jan 2001
Common sense math answer that will surely infuriate mathematicians:

There's a 9% chance that of the two men one will be white and one will be black.

So we now know that this combination is rare in this town.

Calculating the probability that the Black dude will be a criminal gives a 1.8% possibility looking at the entire population; or a 2.9% possibility looking at the criminal population.

Calculating the probability that the White dude will be a criminal gives a 4.5% possibility looking at the entire population. It gives a 5.4% possibility looking at the population of criminals.

Thus about 5-2 odds that chasing the White dude will get the cop a collar, given no other information.
Reply With Quote
  #27  
Old 01-02-2004, 10:48 AM
pjd pjd is offline
Guest
 
Join Date: May 2003
ALL RIGHT, ALL RIGHT STOP IT !

IT WAS ME, OK ?

I admit it.
Reply With Quote
  #28  
Old 01-02-2004, 11:08 AM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
hroeder, you have thrown in a new perspective which had not occurred to me and which makes sense but I think you may have the numbers wrong. Let me do some math and see what I get.
Reply With Quote
  #29  
Old 01-02-2004, 11:16 AM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
A) The white guy did it. Then the black guy can be *any* of the 1000 black guys in town, good or bad.

B) The black guy did it. Then the white guy can be *any* of the 9000 white guys in town, good or bad.

So, we have 9000 cases in which the black guy did it versus 1000 cases in which the white guy did it. The reasoning seems correct to me and yet the conclusion seems wrong in that the process does not even take into account the numbers of bad guys in each group. I assume the process is wrong but I cannot put my finger on it. Someone?
Reply With Quote
  #30  
Old 01-02-2004, 11:26 AM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
I still can't put my finger on it but it seems more reasonable to use strictly the numbers of bad guys in which case the probability is 70% that the white guy did it. I feel quite confident with that.

At least I suppose that is a valid first analysis. Then I suppose it could be refined but I still doubt the result got by Omphaloskeptic that the probability is only 17% that the white guy did it. I think there *has* to be some error along the way.

The answer may be neither number but right now, if I had to pick between 17% 1nd 70% I would pick 70% that it was the white guy. I would also bet that the correct answer is not higher than 70% but not as low as 17% .
Reply With Quote
  #31  
Old 01-02-2004, 11:49 AM
KP KP is offline
Guest
 
Join Date: Sep 1999
Hroeder. I largely agree, but with a significant distinction.

There are 450 white criminals and 200 black criminals. With no data on any differences in extent of the criminal records of black vs. white criminals, we can only consider them equally "criminal", making the odds 450:200 (69.23%) that any crime is commited by a white.

As I said earlier (but apparently, not well): the number or race of innocent people is irrelevant. By the assumptions of the problem, if the entire town were present at a speech and the speaker was assassinated, you'd have 650 suspects, not 10,000. If 1000 (honest) FBI agents were present, there are still 650 suspects, not 11,000. If the town's 1000 Quakers, black and white, are never criminals, and didn't attend, there are still 650 suspects, not 9000. If "all criminals are equal", then there's a 69.23% chance the assassin is white, even if *everyone* flees the scene in fear.

If ONLY criminals commit crimes, only criminals are relevant. It's always possible to calculate the criminal %age for any group (e.g. by last digit of SSN), but that merely compares the irrelevant innocents to the relevant criminals.

The population of China is also innocent of this crime, should we use the ratio of black criminals to native Chinese to decide between the black and white suspect? The fact is: the ratios of black and white criminals to native Chinese actually gives the right answer! Why, because we're using the *same* denominator to compare both sets of criminals. "Innocents of the same race" or "total population of same race" are inappropriate, skewing denominators.
Reply With Quote
  #32  
Old 01-02-2004, 11:59 AM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
Yes, I think KP's analysis is right. At least initially the odds are 450/200 that a white guy did it and that is the conclusion if the officer arrives at the scene and sees no one.
The question is whether seeing a white guy and a black guy adds any relevant information which would alter the odds and that is where I am not clear but right now I can't see how it does.

If it did it would lower the figure somewhat but I can't see hou it could lower it to anything close to 17%, even 50% would seem a stretch.
Reply With Quote
  #33  
Old 01-02-2004, 02:03 PM
Omphaloskeptic Omphaloskeptic is offline
Guest
 
Join Date: Oct 2001
OK, let's change the parameters and see if the conclusion (that the odds are 450:200) makes sense.

Here are the original statistics (corrected):
Code:
        Honest    Bad   Total 
White:    8550    450    9000
Black:     800    200    1000
Totals:   9350    650   10000
Now let's consider an extreme case (Example 2 for reference):
Code:
        Honest    Bad   Total 
White:    9350    450    9800
Black:       0    200     200
Totals:   9350    650   10000
Notice that I have not changed the "Bad" column at all. I've only changed the 800 honest blacks to whites; in our new fictional town of East X there are no honest blacks. Is it still the case that the odds are 450:200 that a white guy did it, even though the black guy is a guaranteed criminal?

I can make this even more extreme by adding honest whites (Example 3):
Code:
        Honest    Bad   Total 
White:  999350    450  999800
Black:       0    200     200
Totals: 999350    650 1000000
In the bustling metropolis of Lower East X, with a white criminal proportion of 0.045% and a black criminal proportion of 100%, there are still 450 bad whites and 200 bad blacks. Is it still 450:200 for the white guy?

I'm trying to come up with a more intuitive example, but (to me at least) these two extreme cases make the 450:200 odds seem very unreasonable. Let me try another explanation of the new information: You have two populations (whites and blacks). The relevant feature of these populations is that the smaller population has a higher proportion of criminals. You see two suspects at the scene, and (by the tacit assumption in the problem) exactly one is guilty. The fact that exactly one of the suspects from the smaller, higher-crime population is at the scene is relevant because it is a relatively unlikely occurrence.
Reply With Quote
  #34  
Old 01-02-2004, 02:12 PM
Omphaloskeptic Omphaloskeptic is offline
Guest
 
Join Date: Oct 2001
Quote:
Originally posted by KP
As I said earlier (but apparently, not well): the number or race of innocent people is irrelevant. By the assumptions of the problem, if the entire town were present at a speech and the speaker was assassinated, you'd have 650 suspects, not 10,000. If 1000 (honest) FBI agents were present, there are still 650 suspects, not 11,000. If the town's 1000 Quakers, black and white, are never criminals, and didn't attend, there are still 650 suspects, not 9000. If "all criminals are equal", then there's a 69.23% chance the assassin is white, even if *everyone* flees the scene in fear.

If ONLY criminals commit crimes, only criminals are relevant. It's always possible to calculate the criminal %age for any group (e.g. by last digit of SSN), but that merely compares the irrelevant innocents to the relevant criminals.

The population of China is also innocent of this crime, should we use the ratio of black criminals to native Chinese to decide between the black and white suspect? The fact is: the ratios of black and white criminals to native Chinese actually gives the right answer! Why, because we're using the *same* denominator to compare both sets of criminals. "Innocents of the same race" or "total population of same race" are inappropriate, skewing denominators.
The reason that the innocent populations are not irrelevant here is that there are two people at the scene, one guilty and the other innocent. This is statistically more likely if the guilty man is from a more-guilty population, but also if the innocent man is from a more-innocent population.

(The innocent population of China, etc., is irrelevant to the question not because of its innocence but because it is not part of the problem universe. Charles Manson is also irrelevant to the question for the same reason, even though he's guilty.)
Reply With Quote
  #35  
Old 01-02-2004, 05:35 PM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
I have to admit I am totally perplexed by what seems such a simple probability problem. Several analysis all seem correct to me and yet they cannot be all correct as they lead to contradictory results. Then I can find fault with all. As i say, i am perplexed.
Reply With Quote
  #36  
Old 01-02-2004, 08:08 PM
KP KP is offline
Guest
 
Join Date: Sep 1999
Oomphaloskeptic makes a valid point. I'm not sure I agree with his explanation, but his numeric argument is solid.

I've played with some numbers, and it appears that there is some effect beyond the population size effect [Effect #1 in my original post] that seems to be nonlinear - perhaps a factor of x/(1-x) - "the ratio between empty space and filled space"in a fixed size container. Such equations behave differently when compared quantities are near each other ["in the same regime"] vs. when they are far apart.

I overlooked this factor in the original example where both quantities were in the same regime. I'll need to work out a single unified equation to understand which regimes various factors predominate. I'll report back then.

But again: Oomphaloskeptic is right: I was wrong to the number of innocents is completely irrelevant. (we all know what heppens when people If I ignore, bury or forget their kistakes, so I try to emphasize mine.)
Reply With Quote
  #37  
Old 01-02-2004, 08:23 PM
Achernar Achernar is offline
Guest
 
Join Date: Aug 1999
I don't know Bayesian math, but it seems to me that whatever solution is correct must fulfill the following criteria, if B is the fraction of blacks which are bad and W is the fraction of whites which are bad, regardless of the overall populations:
  1. If B = W, then color and honesty are independent variables, and so it's exactly 50/50.
  2. If B = 0, then it's 100% certain that the cop should pursue the white one, and if W = 0, then it's 100% certain that the cop should pursue the black one.
  3. If B = 1, then it's 100% certain that the cop should pursue the black one, etc.
  4. If B = W = 1 or B = W = 0, then the best choice is undefined.
Does anyone disagree with these? Because some of the solutions so far contradict them.
Reply With Quote
  #38  
Old 01-02-2004, 09:42 PM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
Achernar, in simple terms what you are saying is that the cop look at the probabilities that each individual is bad. Your assertion makes sense from a certain point of view and this is the problem I am having: different POV yield different answers. Look:

The cop arrives and thinks of the crime. Clearly the probability that it was committed by a white guy is 70% because 70 % of crimes in the city are being committed by white guys. So he should chase mr White. The logic seems irrefutable.

But now he forgets about the crime when he sees the two guys and thinks (as you point out) the probability that a black individual is a bad guy is four times higher than the probability that a white guy is bad. So he should chase Mr Black.

Both reasonings look sound to me and yet they can't both be true because they are contradictory.

Probably the correct answer to the probability of the white guy being who did it lies somewhere between the upper 70% and the lower 20% which each scenario yields. But that is still a huge range and I do not know what logical process would combine both factors to give the correct answer.
Reply With Quote
  #39  
Old 01-02-2004, 10:04 PM
ultrafilter ultrafilter is offline
Guest
 
Join Date: May 2001
Quote:
Originally posted by Achernar
I don't know Bayesian math, but it seems to me that whatever solution is correct must fulfill the following criteria, if B is the fraction of blacks which are bad and W is the fraction of whites which are bad, regardless of the overall populations:
  1. If B = W, then color and honesty are independent variables, and so it's exactly 50/50.
  2. If B = 0, then it's 100% certain that the cop should pursue the white one, and if W = 0, then it's 100% certain that the cop should pursue the black one.
  3. If B = 1, then it's 100% certain that the cop should pursue the black one, etc.
  4. If B = W = 1 or B = W = 0, then the best choice is undefined.
Does anyone disagree with these? Because some of the solutions so far contradict them.
Case 4 is contained in case 1, isn't it?
Reply With Quote
  #40  
Old 01-02-2004, 10:11 PM
Achernar Achernar is offline
Guest
 
Join Date: Aug 1999
Well, as I wrote it, case 4 contradicts specific cases of 1, 2, and 3. But I meant for it to supercede them.
Reply With Quote
  #41  
Old 01-02-2004, 10:18 PM
Omphaloskeptic Omphaloskeptic is offline
Guest
 
Join Date: Oct 2001
I'm trying to write a more detailed explanation of my original (Bayesian) answer, but for now let me try to explain why I think your 70% solution is wrong:
Quote:
Originally posted by sailor
The cop arrives and thinks of the crime. Clearly the probability that it was committed by a white guy is 70% because 70 % of crimes in the city are being committed by white guys.
This is fine so far. In the absence of any other information, his universe of suspects is all criminals in town, ~70% of whom are white.
Quote:
So he should chase mr White. The logic seems irrefutable.
This is where I think the problem lies. Once he sees the two people running from the scene, these are his two suspects (in reality, these would be "primary" suspects and there would still be some suspicion cast on criminals not at the scene, but I've ignored this). He's not choosing any more whether to chase all 450 white criminals or all 200 black criminals, just whether to chase this one particular white (who may be a criminal) or this one particular black (who may be a criminal).

I hesitate to bring up a different example, because I think analogies usually just confuse the issue, but here's my attempt (NB: this is primarily aimed at the specific problem I see with the reasoning above, and not as a complete analogy):
Quote:
I have a large number of nickels and dimes, all minted in either 2001 or 2002; 20% of the nickels and 5% of the dimes are dated 2001. From this stock I (randomly) pick 30 dimes and 3 nickels, placing them face down on the table in front of you. I allow you to turn over either all the dimes or all the nickels. If you want to find a 2001 coin, which should you choose?
SPOILER:
Clearly you turn over the dimes; though each individual dime is less likely to be a 2001, there are ten times as many, enough that there is more likely a 2001 dime on the table than a 2001 nickel.


Now I (randomly) remove all but a single nickel and a single dime from the table. I peek at their heads and tell you that exactly one is a 2001 coin. Which coin do you turn over to find the 2001?
Reply With Quote
  #42  
Old 01-02-2004, 10:19 PM
ultrafilter ultrafilter is offline
Guest
 
Join Date: May 2001
If B = W = 1, then both men are criminals, so chasing either man could result in catching a criminal, if not the perpetrator of this crime. So the best choice is to chase one of them--perhaps the one who is running more slowly.

If B = W = 0, then neither man is the criminal, but may be a witness. Again, the best choice is to chase one of them.

So it looks to me like case 1 contains case 4, and cases 2 and 3 should be taken to exclude B = W.

But I do agree that any solution must have this property.
Reply With Quote
  #43  
Old 01-02-2004, 10:36 PM
Achernar Achernar is offline
Guest
 
Join Date: Aug 1999
Okay, I agree ultrafilter. I was thinking that the cop assumed that exactly one of the two was bad, in which case Case 4 would contradict this assumption. But I realize now that this assumption is not necessary.
Reply With Quote
  #44  
Old 01-03-2004, 05:06 PM
KP KP is offline
Guest
 
Join Date: Sep 1999
I haven't had a chance to fully work out the unified equation, but I did come up with a few thoughts I thought I'd throw out. (I'm still working on these, too)

The key concept here seems to be "the universe of (applicable) possibilities" or the "gross denominator". When making a chart or possibilities, you must pick the correct rows and columns, or "counting cells" will mislead you.

As I earlier noted, making a matching grid of blacks vs whites produces a false bias. In such a grid, each additional white person increases the number of cells in each black criminal's row, but not any white criminal's column. But does a black man actually become "more probably guilty" if white person moves to town?

To remove any obscuring 'intuitions', let's rename the conditions. The town is a former all-boy's school. Blacks and whites are girls and boys. 'Criminals' are 'drivers'. The crime becomes an accident where a truck hits a car, killing the driver, but not the one passenger. We'll assume all dating is in-school and heterosexual.

Just as "only criminals commit crimes", only drivers drive, but a license doesn't prove you weren't a passenger. A criminal record doesn't prove guilt in any later crime.

In this example, it's easier to see that to assess the odds that a girl died, you don't multiply by the number of "available boys" (vs. the girls available to the guys) The date happened. There was ample opportunity for it to happen (the number of potential partners amply exceeds the number of drivers) Since it is not a limiting condition, you should leave it alone (Below, we'll see how Example C hits a limiting condition)

Statistical "opportunity" can be illusory: pedestrians aren't run over 10x as much in cities with 10x more roads; in fact the accident rate is often higher with fewer roads. Dating -or having innocents near your crime- usually isn't limited by the number of possible partners, so the effect of more potential partners isn't calculable. When you hear of the accident, knowing that the accident was on a date (vs. with a same sex friend) may your affect assessment on a person-by-person (not gender) basis

The fraction of licensed boys vs. girls (5% vs 20%) DOESN'T affect the probable gender of the victim. The fraction of drivers who are boys (69.23) vs girls (30.77) DOES.

"Licensure rate by gender" (criminality by race) is a sloppily framed statistic which could only be used if we felt we 'needed' to judge by raw gender, just as the original scenario was crafted to FORCE us to judge by race: the only answers we're allowed to give are "black" or "white"

An insurance company would go broke if they used "accidents per girl" instead of "Accidents per girl driver" to calculate rates. The scenario makes it sound like the cop MUST decide based on race, but in fact, he could chase the one who is closer, slower, wearing lighter colored clothing (easier to see at night), looks easier to subdue, is headed toward less concealing cover, or even choose one at random. A cop who sees two fleeing suspects and sees only race is a poor cop indeed.

Now let's remove race intuitions from Oomphaloskeptic's most extreme Example C:

It's a post-Apocalyptic future after a cruel bioweapon killed almost all women. By tradition, all women drive (at first, they didn't dare ride with a man!) but almost no men are allowed to drive (they might catch the few women). After a century or so, women are no longer afraid; they are worshipped and protected.

It's very rare to see a man alone with a woman (who are 1/5000th of the city). Yet one day, a paper reports that -horrors- a accident killed the driver of a car containing a woman. The whole city wants to know: did a woman die?

Like the original scenario, it is an unlikely event cherry-picked to make a point, but does the extreme rarity change the conclusion we established above?

No, it doesn't! While I agree that, this time, it was probably a woman who was killed (a black who was guilty), the 100% prevalence of driving among women (criminality among blacks) is actually quite irrelevant

The apparently contradictory finding of Example C is not caused by the HIGH 100% rate, but by the ULTRA-LOW rates: prevalence of women, and prevalence of driving males (small number of blacks, and almost total noncriminality of whites in Example C)

To see this, let's see how changing the 100% rate affects the "most probable outcome":
Code:
DROPPING BLACK CRIMINAL RATE FROM 100% to 0.05% DOESN'T AFFECT EXAMPLE C
EVEN AT RATES SO LOW THAT NOT ONE SINGLE BLACK CRIMINAL EXISTS

            TOTAL   CRIMINALS    HONEST    Racial Prevalence   % CRIM
BLACK          200         200         0  1: 5000              100%
WHITE       999800         450    999350  1: 1.00020004          0.0450090018%
TOTAL      1000000         650    999350                         0.065%

   B+W: 200.05      B guilty: 199.96            W guilty:  0.09

            TOTAL   CRIMINALS    HONEST    Racial Prevalence   % CRIM
BLACK          200          20       180  1: 5000               10%
WHITE       999800         450    999350  1: 1.0002000400        0.0450090018%
TOTAL      1000000         470    999530                         0.047%

   B+W: 20.086      B guilty: 19.996            W guilty:  0.09

            TOTAL   CRIMINALS    HONEST    Racial Prevalence     % CRIM
BLACK          200           2       198  1: 5000               1%
WHITE       999800         450    999350  1: 1.0002000400       0.0450090018%
TOTAL      1000000         452    999548                        0.0452000000%

   B+W: 2.0896      B guilty: 1.9996            W guilty:  0.09

            TOTAL   CRIMINALS    HONEST    Racial Prevalence     % CRIM
BLACK          200         0.2     199.8  1: 5000               0.1%
WHITE       999800         450    999350  1: 1.0002000400       0.0450090018%
TOTAL      1000000       450.2  999549.8                        0.0450200000%

            TOTAL   CRIMINALS    HONEST    Racial Prevalence     % CRIM
BLACK          200         0.1     199.9  1: 5000               0.05%
WHITE       999800         450    999350  1: 1.0002000400       0.0450090018%
TOTAL      1000000       450.1  999549.9                        0.0450100000%

   B+W: 0.18998     B guilty: 0.09998           W guilty:  0.09
As you can see, the "criminality of blacks" is irrelevant in Oomphaloskeptic's Example C. Even when there is not one single black criminal, but 450 white criminals, the ultra-low white rate makes black men "the likeliest candidate". I don't know what "0.1" black criminal is, but it's way less than "one single criminal".

Apparently even "bad thoughts" by a black man should affect a cop's decision of whom to chase, more than 450 actual White criminals, if the "white crime rate" is low enough.

Such 'small number effects' are non-linear enough to constitute a deliberately skewed sampling: e.g. a tiny black population so small deprives them of the "statistical benefit" of "B/B" scenarios. At a black racial prevalence of 1:5000, the B/B effect is 0.00000004 while the 99.98% of white crime is buried in W/W scenarios.

This makes Example C so sensitive to black misdeeds, and so forgiving of the prospect of white misdeeds that it actually says you should arrest the black when there isn't a single black criminal, but there are hundreds of white criminals.

In fact, under Example C you could raise the White criminal rate to 100%, and the answer still wouldn't be "arrest the white man".

Interestingly, The most extreme case of an "Example C" scenario is "the only Chinese in town," if that person has a criminal record, Example C says he should be chased because his visible ethnicity has 100% criminality. Yet, in reality, the policeman should chase the white suspect: he can pick up the Chinese man later, but the white suspect is still unidentified.
Quote:
What is the true, correct and definitive answer? Who should the officer chase?
Just because a quantity can be calculated, doesn't make it a sufficient basis for a decision. Personally, I'm more inclined to 'follow the math' than any other single factor, but I encounter situations daily when the math simply fails to provide the best solution in cases of limited information.

I find this problem interesting mathematically, but I think there is substantial reason to say that the cop's knowledge of racial statistics is no more relevant than a thousand other details he would have also seen. Some cops are known for giving speeding tickets to sports cars or even just "red" cars, but despite studies indicating that these are worse offenders, I would argue that targetting this "high risk population" would be a poorer global practice than ticketing 'at random'.

[I put red in quites, because I don't have a cite on speeding rates by color.]
Reply With Quote
  #45  
Old 01-03-2004, 07:15 PM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
>> Just because a quantity can be calculated, doesn't make it a sufficient basis for a decision.

Well, I may agree but I am not having to make any decision. I'd just like to know the answer. I have been stumped by probability problems before but this has to be in lesson 1 of Probability 101. it is the simplest probability cae you can imagine, with just two variables. I'm pretty sure there's a correct answer hidden in there somewhere.
Reply With Quote
  #46  
Old 01-03-2004, 08:24 PM
ultrafilter ultrafilter is offline
Guest
 
Join Date: May 2001
I'll agree with Achernar that the limit cases should behave as he specified, so that provides a guide for ruling out some solutions.
Reply With Quote
  #47  
Old 01-03-2004, 08:31 PM
Achernar Achernar is offline
Guest
 
Join Date: Aug 1999
Quote:
Originally posted by sailor
Achernar, in simple terms what you are saying is that the cop look at the probabilities that each individual is bad. Your assertion makes sense from a certain point of view and this is the problem I am having: different POV yield different answers. Look:

The cop arrives and thinks of the crime. Clearly the probability that it was committed by a white guy is 70% because 70 % of crimes in the city are being committed by white guys. So he should chase mr White. The logic seems irrefutable.
I just thought of a way to refute this. Suppose 10% of people are Scorpios, and, no surprise, 10% of crimes are committed by Scorpios. The cop shows up at the scene, and using this logic, concludes that it's 90% likely that it was not commited by a Scorpio. Checking two suspects' IDs, he sees that one is a Scorpio and the other is a Capricorn. Should he suspect the Capricorn more?
Reply With Quote
  #48  
Old 01-03-2004, 11:03 PM
Lance Turbo Lance Turbo is offline
Guest
 
Join Date: Aug 1999
You see a black guy and a white guy. There are exactly 9,000,000 possibly black guy white guy pairs.

Of those 9 million happy couples:

6,840,000 are an honest white guy and an honest black guy
1,710,000 are an honest white guy and a dishonest black guy
360,000 are a dishonest white guy and an honest black guy
90,000 are a dishonest white guy and a dishonest black guy

I believe we are assuming that we have one honest man and one dishonest man, so we can throw out 6,930,000 cases of honest/honest and dishonest/dishonest.

That leaves 2.07 million cases of which about 82.6% contain a dishonest black man.

Chase the black dude.
Reply With Quote
  #49  
Old 01-04-2004, 12:24 AM
KP KP is offline
Guest
 
Join Date: Sep 1999
Lance Turbo:

You absolutely CANNOT throw out the dishonest/dishonest cases, and get the correct numeric answer; those cases are intrinsic to the problem. However, thus far, it seems likely to me they may not affect your final decision under an algebraic model, except possibly under very extreme discrepancies in population size or prevalence. The question looks to be even trickier under discrete math model (criminals and citizens must come in integer units).

Quote:
sailor said
but this has to be in lesson 1 of Probability 101. it is the simplest probability cae you can imagine, with just two variables.
Well, technically, Probability 101 does tell you that "Probability is only good for predicting the behavior of a large number of samples. It can't predict single incidents, and is poor at small sample sizes." Also, This is *not* a two variable problem. That is at the root of the original "paradox" you cited. It's at least a three variable problem, where the third variable must be calculated by using two of the variables you provided. (see below)

I do understand what you mean, of course, but Probability is always a matter of inexact knowledge. I've been trying to prove to myself whether the 'Probability 101' model is the best possible approximation, the conditions where it is weakest (if any), and whether it either over-assumes or under-uses all available data. I had assumed that was the primary thrust of the thread, since the resolution to your OP has already been given. However, re-reading the thread, it's clear that not everyone is debating the same issues.

I apologize if my focus has caused confusion (apparently it's confused me at least once!), and I'll concede there's more than a small measure of 'Devil's Advocate' in it (I was taught that critical analysis is essential) I don't do it to annoy or mislead - it's actually a fair amount of work!

To make up for that, here's the derivation of...

The "Probability 101" answer:
We have T<w> white candies and T<b> black candies. Some of each are milk chocolate (M) inside and some are dark chocolate (D).

For a randomly selected candy of color c, M<c>:D<c> = odds that it has milk chocolate. The probability P<c> = M<c>/T<c>.

The number of milk chocolates in each color M<c> = P<c>*T<c>

HOWEVER, since M<c>=P<c>*T<c>, M<w> can be greater than M<b> even if P<b> is greater P<w>, if and only if T<w>/T<b> is greater than P<b>/P<w> [i.e. more crimes can be committed by the less crinimal group, if it is large enough] This is the resolution of the apparent paradox in the OP. (this is a 3-variable problem. You must know P<w>, P<b> and T<b>/T<w>)

Independence is not a trivial issue in problems like these. The "Monty Hall paradox" hinges on the issue of whether seemingly independent consecutive options are genuinely independent, and therefore whether they have equal probability.

If I draw a white candy and then a black candy from the bowl, they are independent events. Neither selection affects the other. The odds of each candy being milk chocolate are given by its respective ratio M<c>:D<c> in the bowl. P<c> can also be used.

If I draw two candies together, and then return them to the bowl, until I get a black and a white together, the two candies are still independent draws and the chances of each color being milk chocolate are still given by its respective P<c>.

HOWEVER, if I do the above until I have a black-white pair AND exactly one milk chocolate between them, the probability of the two colors being milk chocolate are no longer independent. in this case:

P<wb, md> = [P<w>*(1-P<b>)] + [P<b>*(1-P<w>)]

HOWEVER, this does NOT represent the crime case correctly. One of the parties commited the crime and must be a criminal, BUT the other party can be either a 'criminal' or 'honest' [i.e. his 'criminality' is immaterial] We can no longer use the marbles or candies that elementary probability texts so adore.

P = [P<w>*(1)] + [P<b>*(1)] - [P<b>*P<w>]
because [P<w>*(1)] + [P<b>*(1)] double-counts the instances
where both men are criminals (once in P<w> and once in P<b>)


This tells us the probability of the situation, but it does not yet tell us exactly how to apportion the chances of guilt between the two suspect. Since the equation (and especially the third term) is symmetric with respect to P<w> and P<b>, we might think:

P<white guilty> = P<w> - [P<b>*P<w>]/2
P<black guilty> = P<b> - [P<b>*P<w>]/2
Note that the chances of black and white guilt are *not* independent.

I'm not 100% certain that this is the best possible answer, but it's definitely as far as you'd get in Probability 101. There are several potential issues that this simple derivation does not address [such as the surprising 'fractional black criminal' in my last post]. When you get away from simple models (e.g. by moving from continuous to discrete mathematics) interesting results often fall out of the cracks.
Reply With Quote
  #50  
Old 01-04-2004, 07:32 AM
sailor sailor is offline
Guest
 
Join Date: Mar 2000
Lance Turbo, I believe your analysis of counting possible pairs is correct but I do not think you should leave out bad-bad pairs so I would redo it like this:

You have 1,710,000 + 90,000 = 1,800,000 cases with a bad black guy.
You have 360,000 + 90,000 = 450,000 cases with a bad white guy
Therefore the probabilities are exactly 80% and 20% respectively. I believe this is the correct solution until someone can point out why it is wrong (and I am sure someone will come along shortly and do just that).
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 08:40 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

Send questions for Cecil Adams to: cecil@chicagoreader.com

Send comments about this website to: webmaster@straightdope.com

Terms of Use / Privacy Policy

Advertise on the Straight Dope!
(Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.)

Publishers - interested in subscribing to the Straight Dope?
Write to: sdsubscriptions@chicagoreader.com.

Copyright © 2013 Sun-Times Media, LLC.