|
|
|
#1
|
|||
|
|||
|
Statistics & profiling problem
Reading a thread going on in GD this question occurred to me.
The town of X has a population of 10,000 of which 10% black and 90% white. Of blacks, 20% are criminals and 80% are honest while of whites 5% are criminals and 95% are honest. Code:
Honest Bad Total Whites: 8550 450 9000 Black: 800 200 1000 Totals: 9450 650 10000 A crime has been committed and an officer arrives on the scene and sees a white guy and a black guy leaving the scene in opposite directions. He cannot go after both so he has to choose which one to go after. Hmmm, he thinks, 70% of crimes are committed by white guys which is more than twice the percentage of crimes being committed by black guys. I should therefore go after the white guy. Oh, but wait, taken individually the white guy has a probability of 5% of being a criminal but the black guy has a probability of 20% which is 4 times higher. I should therefore go after the black guy. What is the true, correct and definitive answer? Who should the officer chase?
__________________
Posted using 100% recycled electrons. |
| Advertisements | |
|
|
|
|
#2
|
|||
|
|||
|
He should wound one in the leg and then chase the other.
|
|
#3
|
|||
|
|||
|
This looks like a problem that can be addressed by Bayes' theorem, but I'm not 100% sure how to apply it here.
|
|
#4
|
|||
|
|||
|
First the math, then the explanation of why the math is probably not applicable anyway.
So: mathematically, assuming that the black man is randomly selected from the set of all black people in the town of X, then the probability that he is a "bad guy" is 20% as you've stated. So in that sense, the black guy is more likely to be a "bad guy" than the white guy (assuming that the white guy is a randomly selected white guy as well). The fact that there are more white bad guys than black ones means that if you select a bad guy at random then he's more likely to be white than black, but that's not what's happening in the situation you describe. But of course, these people aren't randomly selected are they? It's not as if the census bureau picked one white man and one black man at random and dropped them at the scene. So statistical conclusions that assume the people are randomly selected, like the one in the previous paragraph, are of questionable merit at best. For all the cop knows there's a secret white bad guy convention going on down the street. The cop really has no information to suggest which suspect is more likely to be a criminal, without making an unfounded statistical assumption. |
|
#5
|
|||
|
|||
|
Well, while the cop is doing the math, both of them will get away. I suggest that we should agree with don't ask. Too bad that we have a racial system in the US. Cops shouldn't be asked to judge between black or white.
__________________
A committee is a thing which takes a week to do what one good man can do in an hour. ~Elbert Hubbard
|
|
#6
|
|||
|
|||
|
>Hmmm, he thinks, 70% of crimes are committed by white guys which is more than twice the percentage of crimes being committed by black guys. I should therefore go after the white guy.
I think this is a red herring. It would have influenced how likely it would be that one white and one black are leaving the scene, but that's a given and so the statistic is not useful here. >Oh, but wait, taken individually the white guy has a probability of 5% of being a criminal but the black guy has a probability of 20% which is 4 times higher. I should therefore go after the black guy. I think this is the relevant statistic. It's also the correct conclusion, if we're given that the cop should take advantage of the statistical information and also that he has no other criteria to use. These aren't necessarily trivial points. >The cop really has no information to suggest which suspect is more likely to be a criminal, without making an unfounded statistical assumption. I think this isn't correct, in two ways. The statistical information is certainly "information to suggest which suspect is more likely to be a criminal", though it is certainly weak information that still leaves the criminal's identity quite uncertain. Weak information is certainly more useful than none at all. And, I don't hear anything about assumptions here, founded or not. BTW, sailor, you're not afraid to ask the tough questions, are you? Perhaps we should substitute "army men" for "white men" and "navy men" for "black men". After all, why offend anybody needlessly? |
|
#7
|
|||
|
|||
|
It's not quite right to suggest that just because you don't know everything you can't know anything. In the absence of complete information you can only make guesses, of course, and these guesses will sometimes be wrong; but this is a far cry from having no information at all. Using probability theory is one way of trying to make better guesses: i.e. guesses which are, statistically, more likely to be correct. Of course this relies on various assumptions (e.g. statistical independence between various random variables), and if your assumptions are badly wrong then you might make worse guesses instead.
A Bayesian might analyze the situation as follows: Initially the officer sees two men, X and Y, running from the scene, and assigns each of them equal prior probabilities of being the criminal: P(Y,¬X) = 1/2 (prior probability that Y is criminal and X is not) P(X,¬Y) = 1/2 (prior probability that X is criminal and Y is not) As he approaches, he sees further that X is black (Xb) and Y is white (Yw). He now updates his priors with the new information, using Bayes' Law (linked by ultrafilter above): P(Y,¬X | Xb,Yw) = P(Xb,Yw | Y,¬X) P(Y,¬X) / P(Xb,Yw) P(X,¬Y | Xb,Yw) = P(Xb,Yw | X,¬Y) P(X,¬Y) / P(Xb,Yw) The denominator (basically a normalization factor) is computed by summing over all possibilities: P(Xb,Yw) = P(Xb,Yw | Y,¬X) P(Y,¬X) + P(Xb,Yw | X,¬Y) P(X,¬Y) . Now how can we compute P(Xb,Yw | Y,¬X) (the probability that X is black and Y is white, given that Y is criminal and X is not)? We might assume (in the absence of more comprehensive statistical information) that the probability that X is black does not depend on whether Y is criminal, i.e. that the actions of X and Y are basically independent of each other. In this case we can write P(Xb,Yw | Y,¬X) = P(Xb | ¬X) P(Yw | Y) . Now P(Xb | ¬X) (the probability that X is black, given that he is not a criminal) and P(Yw | Y) (the probability that Y is white, given that he is a criminal) are given in the statistical tables provided: P(Xb | ¬X) = 800/9350 = 16/187 [note typo "9450" in table] P(Yw | Y) = 450/650 = 9/13 So P(Xb,Yw | Y,¬X) = (16/187) (9/13) = 144/2431 and similarly P(Xb,Yw | X,¬Y) = (171/187) (4/13) = 684/2431 so P(Xb,Yw) = (144/2431)(1/2) + (684/2431)(1/2) = 414/2431 and the updated posterior probabilities are P(Y,¬X | Xb,Yw) = (144/2431)(1/2) / (414/2431) = 72/414 = 4/23 (posterior probability that Y is criminal and X is not) P(X,¬Y | Xb,Yw) = (684/2431)(1/2) / (414/2431) = 342/414 = 19/23 (posterior probability that X is criminal and Y is not) This approach, of updating a priori probabilities to reflect new information using Bayes' Law, is called Bayesian inference. It's an extremely useful statistical technique, though (as with all statistical techniques) it relies on having valid data and assumptions. Depending on how you define "better," it may be appropriate to consider factors besides the probabilistic results; some of these other factors come into arguments against profiling. (The game-theoretic aspects of policing, for example, mean that the actions the officer takes in this round may affect the approaches taken by the parties in future rounds.) |
|
#8
|
|||
|
|||
|
Quote:
|
|
#9
|
|||
|
|||
|
Quote:
Quote:
|
|
#10
|
|||
|
|||
|
Quote:
|
|
#11
|
|||
|
|||
|
Quote:
|
|
#12
|
|||
|
|||
|
Omphaloskeptic, the more I think about it the more confused I am and I arrive at contradictory results but none are what you say.
One analysis: The cop arrives on the scene and sees a crime was committed. He sees no one and thinks correctly that the chances are 70% that it was a white bad guy. Now he sees the black and the white guy. This yields no new information so he has better chances of catching the criminal if he goes after the white guy. Right now I think this is correct. |
|
#13
|
|||
|
|||
|
Quote:
Beforehand, your suspicion that it was probably a white guy was weighted by the fraction of white suspects (large relative to the fraction of black suspects). But since they are no longer suspects, they (along with the rest of the 9998 other people in town) don't weight the results any longer. What's important now is how likely each of these two individuals is to be a criminal. |
|
#14
|
|||
|
|||
|
Quote:
With no other information other than a crime scene, you would assume it was 70% likely a white suspect because whites commit 70% of the crimes. However since the presence of a black person is noted from among the two individuals leaving the scene and by your stats any given black guy is about 4 times more likely to commit a crime as any given white guy. This seems to correlate with the results of 4/23 chance of white perp vs 19/23 chance of black perp. |
|
#15
|
|||
|
|||
|
I'm going to have to think about that. I say this because I remember stumbling over another probability problem in another thread long time ago. It was the Monty Hall 3 door problem. I argued adamantly and suddenly, a little light turned on and I realized I was wrong. But it took quite some time and effort. This time I am going to be more prudent, especially because I am even less sure this time. Let me give it some thought.
|
|
#16
|
|||
|
|||
|
Actually, there are a whole passel of assumptions here. Please indulge me as I toy with some of them. If you just want the "meat" [genuine objection] skip to the bolded "Final Analysis".
ISSUE #1: you presume "Criminals" are always guilty, and "Honest men" never are - all criminals are born guilty, and all honest men are forever sainted. This is necessarily false, and weakens the derivation above. In real life, of course, all men are born innocent, and some become guilty at some point. Without knowing the rate of (convicted) first offenses - the rate at which honest men become criminals, we can only guess (or approximate) the actual probabilities ISSUE #2: what do real cops think if they see two men running away from the scene of a crime? They think both are guilty! I'll get back to this point ISSUE #3: If all guilty men are "always guilty" then we get some truly funky situations. If a penny is stolen from the till, every criminal in the vicinity must be guilty - a neat trick, and a ludicrous assumption for the real world. You may argue that the unstated principle ('assumption' is more like it) that only one is guilty is implicit in the formulation of this problem (and similar problems of its class) But that forces a revision of the math. You may say "the crime was murder, and only one gunshot was heard" (we'll ignore the crime of 'fleeing the scene'- which only casts more doubt on the presumption that "only criminals commit crimes") We still have to refine the derivation. If both men were chosen at random, then a Real Cop's initial assumption (both are criminals for fleeing the scene) is correct 1% of the time, and the "reader's assumption" (there is only one criminal) is correct 23% of the time - but 76% of the time there'd be no victim at all! The apparently relevant denominator ambiguous. it's either A) "all cases where at least one man is a criminal" (which I think is the only physically arguable case); or b) "all cases where only one man is a criminal" (whose only merit is conforming to a common presumption) Pa = 450/9000 + 200/1000 - [(450/9000) * (200/1000)] = 24% (the subtracted term removes the 'double-counted' overlap) Pb = 450/9000 + 200/1000 - 2*[(450/9000) * (200/1000)] = 23% (the subtracted term removes *both* counts of the overlap) In either case, we can only rely on the prevalence of "guilty" men in each racial subgroup 10% (B) vs 5% (W) - but the analysis is completely flawed, because in a random matching of candidates the actual murderer would be White roughly 450/650 of the time. Why were we wrong? FINAL ANALYSIS -------------------- The flaw is: crimes are committed solely by criminals. Any number which includes innocent people merely obfuscates the issue. This includes statistics like "percentage of criminals" or "total population", which are affected by the number of innocents. changing the number of innocents does not affect the probability that a man is guilty. The statistically valid denominator is the number of potential murderers in town . All the Chinese in China, or all the innocent Chinese in town are irrelevant. Issue 4: the problem was set up to demand a black man and a white man at the scene.The universe of random pairings, howeverdoes not reflect the underlying events each black man, guilty or innocent, is "forced to flee" 9-10.6875 times as many hypothetical crime scenes as each white man. This dramatically overestimates the possibility of black guilt EFFECT 1: In the universe of of cases where a black man is guilty, the analysis provided pairs him against 9000 white men [case A] or 8550 innocent white men [case B] while each white man is mathematically paired against only 1000 or 800. If you wrote a chart every "random pairing, each black man's name would appear either 9000 or 8550 times, while each white man's name would only appear on the chart 1000 or 800 times. Throw a dart at this chart, and the result isn't "fair" - the black men have 9000/1000 or 8550/800 times as many slips in the hat. To illustrate this,make the numbers more extreme. Say that there are only 10 blacks in the city (and 1 black criminal) while there are 10 million whites (and 50,000 white criminals). By (improper) Bayesian analysis, that black man must commit virtually all the crime that occurs in his vicinity, while the 50,000 white criminals sit on their hands. With a year (before new stats can be issued, every black person would be shot many times, and no white man would be caught in this situation. EFFECT 2: Relying on prevalence in subpopulations will skew all future statistics, even if the population sizes are equal. In cases where men of both races are suspected, the black criminals will always be caught [and be counted], and the white criminals will always escape [and not be counted]. This effect is strongest as the population with the highest pre-existing Effect 1 increases as the black fraction of the total population decreases. Effect 2 increases as the black fraction increases. Racial profiling is a Big Lose for blacks, guilty or innocent. |
|
#17
|
|||
|
|||
|
Well, at least the rest of us understand the problem as it was enunciated even if we're still having problems finding or understanding the solution.
|
|
#18
|
|||
|
|||
|
Quote:
There's a reason my first response stopped where it did (with the computation of the posterior probabilities) and not with a complete answer to the OP's question Quote:
Quote:
|
|
#19
|
|||
|
|||
|
My point is that the problem, as it is enunciated, is flawed.
The "understanding" you cite is what creates the confusion. My final analysis points out why. Sorry about the rest of the stuff, if it didn't interest you. I probably should have taken the racial example less literally, However, picking assumptions apart is not irrelevant. It is an essential first step in mathematical analysis. If this were a more strictly mathematical forum, the assumptions would have been picked apart a lot more already. That's just part of how the game of math (vs. arithmetic) is played |
|
#20
|
|||
|
|||
|
Quote:
|
|
#21
|
|||
|
|||
|
Then again, the problems raised so far can also be used to understand The Problems With Profiling.
The guys could be running from the scene because they are afraid of being wrongly harassed by the police. And in fact we would expect this to be more of a factor for the member of the group that profiling suggests we should harass. So, the person that profiling says we should harass is running because he knows he's going to get blamed, and the guy that profiling says we shouldn't harass is runnig because he's guilty. So the game theory approach to the problem says we should catch the guy that we don't think we should catch. And suddenly the reasoning starts reminding me way too much of The Princess Bride |
|
#22
|
|||
|
|||
|
Luis Tiant, a Cuban born pitcher for the Red Sox, was stopped for jogging in the predominantly white neighborhood in which he lived. The cop's reasoning was that a black man running in that vicinity must be fleeing from something.
|
|
#23
|
|||
|
|||
|
Please take your hijacks to GD
Please! This is not a discussion about racial profiling so I would appreciate it if those wishing to discuss racial profiling would take their discussion to another thread. My question is 100 percent a probability problem and the relevant data is given in my OP. The question is: given the information the cop has, what can he infer with regards to who is the most likely suspect? That is the question. The fact that in the real world there is no town named X, or other similar real-world considerations, are totally outside the problem I am presenting. I am interested in learning about statistics, not about the racial problems of America.
|
|
#24
|
|||
|
|||
|
>A crime has been committed and an officer arrives on the scene and sees a white guy and a black guy leaving the scene in opposite directions. He cannot go after both so he has to choose which one to go after.
>This is not a discussion about racial profiling sailor, you strain me! |
|
#25
|
|||
|
|||
|
Q. Delta flight #45678 leaves Washington DC and flies towards its destination located 6000 miles away at a speed of 2000 mph. How long will it take to get there?
A. There is no Delta flight #456789 There is no airport located in the District of Columbia No commercial airliner can fly at 2000 mph You think that would be considered the correct answer in most schools and colleges? ![]() It seems you cannot mention race, guns, abortion, religion and a number of other hot issues, no matter how in passing, without the thread being hijacked to hell by people who want to argue their pet subjects. Please take it to another thread. This is about mathematical probabilities. |
|
#26
|
|||
|
|||
|
Common sense math answer that will surely infuriate mathematicians:
There's a 9% chance that of the two men one will be white and one will be black. So we now know that this combination is rare in this town. Calculating the probability that the Black dude will be a criminal gives a 1.8% possibility looking at the entire population; or a 2.9% possibility looking at the criminal population. Calculating the probability that the White dude will be a criminal gives a 4.5% possibility looking at the entire population. It gives a 5.4% possibility looking at the population of criminals. Thus about 5-2 odds that chasing the White dude will get the cop a collar, given no other information. |
|
#27
|
|||
|
|||
|
ALL RIGHT, ALL RIGHT STOP IT !
IT WAS ME, OK ? I admit it. |
|
#28
|
|||
|
|||
|
hroeder, you have thrown in a new perspective which had not occurred to me and which makes sense but I think you may have the numbers wrong. Let me do some math and see what I get.
|
|
#29
|
|||
|
|||
|
A) The white guy did it. Then the black guy can be *any* of the 1000 black guys in town, good or bad.
B) The black guy did it. Then the white guy can be *any* of the 9000 white guys in town, good or bad. So, we have 9000 cases in which the black guy did it versus 1000 cases in which the white guy did it. The reasoning seems correct to me and yet the conclusion seems wrong in that the process does not even take into account the numbers of bad guys in each group. I assume the process is wrong but I cannot put my finger on it. Someone? |
|
#30
|
|||
|
|||
|
I still can't put my finger on it but it seems more reasonable to use strictly the numbers of bad guys in which case the probability is 70% that the white guy did it. I feel quite confident with that.
At least I suppose that is a valid first analysis. Then I suppose it could be refined but I still doubt the result got by Omphaloskeptic that the probability is only 17% that the white guy did it. I think there *has* to be some error along the way. The answer may be neither number but right now, if I had to pick between 17% 1nd 70% I would pick 70% that it was the white guy. I would also bet that the correct answer is not higher than 70% but not as low as 17% . |
|
#31
|
|||
|
|||
|
Hroeder. I largely agree, but with a significant distinction.
There are 450 white criminals and 200 black criminals. With no data on any differences in extent of the criminal records of black vs. white criminals, we can only consider them equally "criminal", making the odds 450:200 (69.23%) that any crime is commited by a white. As I said earlier (but apparently, not well): the number or race of innocent people is irrelevant. By the assumptions of the problem, if the entire town were present at a speech and the speaker was assassinated, you'd have 650 suspects, not 10,000. If 1000 (honest) FBI agents were present, there are still 650 suspects, not 11,000. If the town's 1000 Quakers, black and white, are never criminals, and didn't attend, there are still 650 suspects, not 9000. If "all criminals are equal", then there's a 69.23% chance the assassin is white, even if *everyone* flees the scene in fear. If ONLY criminals commit crimes, only criminals are relevant. It's always possible to calculate the criminal %age for any group (e.g. by last digit of SSN), but that merely compares the irrelevant innocents to the relevant criminals. The population of China is also innocent of this crime, should we use the ratio of black criminals to native Chinese to decide between the black and white suspect? The fact is: the ratios of black and white criminals to native Chinese actually gives the right answer! Why, because we're using the *same* denominator to compare both sets of criminals. "Innocents of the same race" or "total population of same race" are inappropriate, skewing denominators. |
|
#32
|
|||
|
|||
|
Yes, I think KP's analysis is right. At least initially the odds are 450/200 that a white guy did it and that is the conclusion if the officer arrives at the scene and sees no one.
The question is whether seeing a white guy and a black guy adds any relevant information which would alter the odds and that is where I am not clear but right now I can't see how it does. If it did it would lower the figure somewhat but I can't see hou it could lower it to anything close to 17%, even 50% would seem a stretch. |
|
#33
|
|||
|
|||
|
OK, let's change the parameters and see if the conclusion (that the odds are 450:200) makes sense.
Here are the original statistics (corrected): Code:
Honest Bad Total White: 8550 450 9000 Black: 800 200 1000 Totals: 9350 650 10000 Code:
Honest Bad Total White: 9350 450 9800 Black: 0 200 200 Totals: 9350 650 10000 I can make this even more extreme by adding honest whites (Example 3): Code:
Honest Bad Total White: 999350 450 999800 Black: 0 200 200 Totals: 999350 650 1000000 I'm trying to come up with a more intuitive example, but (to me at least) these two extreme cases make the 450:200 odds seem very unreasonable. Let me try another explanation of the new information: You have two populations (whites and blacks). The relevant feature of these populations is that the smaller population has a higher proportion of criminals. You see two suspects at the scene, and (by the tacit assumption in the problem) exactly one is guilty. The fact that exactly one of the suspects from the smaller, higher-crime population is at the scene is relevant because it is a relatively unlikely occurrence. |
|
#34
|
|||
|
|||
|
Quote:
(The innocent population of China, etc., is irrelevant to the question not because of its innocence but because it is not part of the problem universe. Charles Manson is also irrelevant to the question for the same reason, even though he's guilty.) |
|
#35
|
|||
|
|||
|
I have to admit I am totally perplexed by what seems such a simple probability problem. Several analysis all seem correct to me and yet they cannot be all correct as they lead to contradictory results. Then I can find fault with all. As i say, i am perplexed.
|
|
#36
|
|||
|
|||
|
Oomphaloskeptic makes a valid point. I'm not sure I agree with his explanation, but his numeric argument is solid.
I've played with some numbers, and it appears that there is some effect beyond the population size effect [Effect #1 in my original post] that seems to be nonlinear - perhaps a factor of x/(1-x) - "the ratio between empty space and filled space"in a fixed size container. Such equations behave differently when compared quantities are near each other ["in the same regime"] vs. when they are far apart. I overlooked this factor in the original example where both quantities were in the same regime. I'll need to work out a single unified equation to understand which regimes various factors predominate. I'll report back then. But again: Oomphaloskeptic is right: I was wrong to the number of innocents is completely irrelevant. (we all know what heppens when people If I ignore, bury or forget their kistakes, so I try to emphasize mine.) |
|
#37
|
|||
|
|||
|
I don't know Bayesian math, but it seems to me that whatever solution is correct must fulfill the following criteria, if B is the fraction of blacks which are bad and W is the fraction of whites which are bad, regardless of the overall populations:
|
|
#38
|
|||
|
|||
|
Achernar, in simple terms what you are saying is that the cop look at the probabilities that each individual is bad. Your assertion makes sense from a certain point of view and this is the problem I am having: different POV yield different answers. Look:
The cop arrives and thinks of the crime. Clearly the probability that it was committed by a white guy is 70% because 70 % of crimes in the city are being committed by white guys. So he should chase mr White. The logic seems irrefutable. But now he forgets about the crime when he sees the two guys and thinks (as you point out) the probability that a black individual is a bad guy is four times higher than the probability that a white guy is bad. So he should chase Mr Black. Both reasonings look sound to me and yet they can't both be true because they are contradictory. Probably the correct answer to the probability of the white guy being who did it lies somewhere between the upper 70% and the lower 20% which each scenario yields. But that is still a huge range and I do not know what logical process would combine both factors to give the correct answer. |
|
#39
|
|||
|
|||
|
Quote:
|
|
#40
|
|||
|
|||
|
Well, as I wrote it, case 4 contradicts specific cases of 1, 2, and 3. But I meant for it to supercede them.
|
|
#41
|
|||
|
|||
|
I'm trying to write a more detailed explanation of my original (Bayesian) answer, but for now let me try to explain why I think your 70% solution is wrong:
Quote:
Quote:
I hesitate to bring up a different example, because I think analogies usually just confuse the issue, but here's my attempt (NB: this is primarily aimed at the specific problem I see with the reasoning above, and not as a complete analogy): Quote:
|
|
#42
|
|||
|
|||
|
If B = W = 1, then both men are criminals, so chasing either man could result in catching a criminal, if not the perpetrator of this crime. So the best choice is to chase one of them--perhaps the one who is running more slowly.
If B = W = 0, then neither man is the criminal, but may be a witness. Again, the best choice is to chase one of them. So it looks to me like case 1 contains case 4, and cases 2 and 3 should be taken to exclude B = W. But I do agree that any solution must have this property. |
|
#43
|
|||
|
|||
|
Okay, I agree ultrafilter. I was thinking that the cop assumed that exactly one of the two was bad, in which case Case 4 would contradict this assumption. But I realize now that this assumption is not necessary.
|
|
#44
|
|||
|
|||
|
I haven't had a chance to fully work out the unified equation, but I did come up with a few thoughts I thought I'd throw out. (I'm still working on these, too)
The key concept here seems to be "the universe of (applicable) possibilities" or the "gross denominator". When making a chart or possibilities, you must pick the correct rows and columns, or "counting cells" will mislead you. As I earlier noted, making a matching grid of blacks vs whites produces a false bias. In such a grid, each additional white person increases the number of cells in each black criminal's row, but not any white criminal's column. But does a black man actually become "more probably guilty" if white person moves to town? To remove any obscuring 'intuitions', let's rename the conditions. The town is a former all-boy's school. Blacks and whites are girls and boys. 'Criminals' are 'drivers'. The crime becomes an accident where a truck hits a car, killing the driver, but not the one passenger. We'll assume all dating is in-school and heterosexual. Just as "only criminals commit crimes", only drivers drive, but a license doesn't prove you weren't a passenger. A criminal record doesn't prove guilt in any later crime. In this example, it's easier to see that to assess the odds that a girl died, you don't multiply by the number of "available boys" (vs. the girls available to the guys) The date happened. There was ample opportunity for it to happen (the number of potential partners amply exceeds the number of drivers) Since it is not a limiting condition, you should leave it alone (Below, we'll see how Example C hits a limiting condition) Statistical "opportunity" can be illusory: pedestrians aren't run over 10x as much in cities with 10x more roads; in fact the accident rate is often higher with fewer roads. Dating -or having innocents near your crime- usually isn't limited by the number of possible partners, so the effect of more potential partners isn't calculable. When you hear of the accident, knowing that the accident was on a date (vs. with a same sex friend) may your affect assessment on a person-by-person (not gender) basis The fraction of licensed boys vs. girls (5% vs 20%) DOESN'T affect the probable gender of the victim. The fraction of drivers who are boys (69.23) vs girls (30.77) DOES. "Licensure rate by gender" (criminality by race) is a sloppily framed statistic which could only be used if we felt we 'needed' to judge by raw gender, just as the original scenario was crafted to FORCE us to judge by race: the only answers we're allowed to give are "black" or "white" An insurance company would go broke if they used "accidents per girl" instead of "Accidents per girl driver" to calculate rates. The scenario makes it sound like the cop MUST decide based on race, but in fact, he could chase the one who is closer, slower, wearing lighter colored clothing (easier to see at night), looks easier to subdue, is headed toward less concealing cover, or even choose one at random. A cop who sees two fleeing suspects and sees only race is a poor cop indeed. Now let's remove race intuitions from Oomphaloskeptic's most extreme Example C: It's a post-Apocalyptic future after a cruel bioweapon killed almost all women. By tradition, all women drive (at first, they didn't dare ride with a man!) but almost no men are allowed to drive (they might catch the few women). After a century or so, women are no longer afraid; they are worshipped and protected. It's very rare to see a man alone with a woman (who are 1/5000th of the city). Yet one day, a paper reports that -horrors- a accident killed the driver of a car containing a woman. The whole city wants to know: did a woman die? Like the original scenario, it is an unlikely event cherry-picked to make a point, but does the extreme rarity change the conclusion we established above? No, it doesn't! While I agree that, this time, it was probably a woman who was killed (a black who was guilty), the 100% prevalence of driving among women (criminality among blacks) is actually quite irrelevant The apparently contradictory finding of Example C is not caused by the HIGH 100% rate, but by the ULTRA-LOW rates: prevalence of women, and prevalence of driving males (small number of blacks, and almost total noncriminality of whites in Example C) To see this, let's see how changing the 100% rate affects the "most probable outcome": Code:
DROPPING BLACK CRIMINAL RATE FROM 100% to 0.05% DOESN'T AFFECT EXAMPLE C
EVEN AT RATES SO LOW THAT NOT ONE SINGLE BLACK CRIMINAL EXISTS
TOTAL CRIMINALS HONEST Racial Prevalence % CRIM
BLACK 200 200 0 1: 5000 100%
WHITE 999800 450 999350 1: 1.00020004 0.0450090018%
TOTAL 1000000 650 999350 0.065%
B+W: 200.05 B guilty: 199.96 W guilty: 0.09
TOTAL CRIMINALS HONEST Racial Prevalence % CRIM
BLACK 200 20 180 1: 5000 10%
WHITE 999800 450 999350 1: 1.0002000400 0.0450090018%
TOTAL 1000000 470 999530 0.047%
B+W: 20.086 B guilty: 19.996 W guilty: 0.09
TOTAL CRIMINALS HONEST Racial Prevalence % CRIM
BLACK 200 2 198 1: 5000 1%
WHITE 999800 450 999350 1: 1.0002000400 0.0450090018%
TOTAL 1000000 452 999548 0.0452000000%
B+W: 2.0896 B guilty: 1.9996 W guilty: 0.09
TOTAL CRIMINALS HONEST Racial Prevalence % CRIM
BLACK 200 0.2 199.8 1: 5000 0.1%
WHITE 999800 450 999350 1: 1.0002000400 0.0450090018%
TOTAL 1000000 450.2 999549.8 0.0450200000%
TOTAL CRIMINALS HONEST Racial Prevalence % CRIM
BLACK 200 0.1 199.9 1: 5000 0.05%
WHITE 999800 450 999350 1: 1.0002000400 0.0450090018%
TOTAL 1000000 450.1 999549.9 0.0450100000%
B+W: 0.18998 B guilty: 0.09998 W guilty: 0.09
Apparently even "bad thoughts" by a black man should affect a cop's decision of whom to chase, more than 450 actual White criminals, if the "white crime rate" is low enough. Such 'small number effects' are non-linear enough to constitute a deliberately skewed sampling: e.g. a tiny black population so small deprives them of the "statistical benefit" of "B/B" scenarios. At a black racial prevalence of 1:5000, the B/B effect is 0.00000004 while the 99.98% of white crime is buried in W/W scenarios. This makes Example C so sensitive to black misdeeds, and so forgiving of the prospect of white misdeeds that it actually says you should arrest the black when there isn't a single black criminal, but there are hundreds of white criminals. In fact, under Example C you could raise the White criminal rate to 100%, and the answer still wouldn't be "arrest the white man". Interestingly, The most extreme case of an "Example C" scenario is "the only Chinese in town," if that person has a criminal record, Example C says he should be chased because his visible ethnicity has 100% criminality. Yet, in reality, the policeman should chase the white suspect: he can pick up the Chinese man later, but the white suspect is still unidentified. Quote:
I find this problem interesting mathematically, but I think there is substantial reason to say that the cop's knowledge of racial statistics is no more relevant than a thousand other details he would have also seen. Some cops are known for giving speeding tickets to sports cars or even just "red" cars, but despite studies indicating that these are worse offenders, I would argue that targetting this "high risk population" would be a poorer global practice than ticketing 'at random'. [I put red in quites, because I don't have a cite on speeding rates by color.] |
|
#45
|
|||
|
|||
|
>> Just because a quantity can be calculated, doesn't make it a sufficient basis for a decision.
Well, I may agree but I am not having to make any decision. I'd just like to know the answer. I have been stumped by probability problems before but this has to be in lesson 1 of Probability 101. it is the simplest probability cae you can imagine, with just two variables. I'm pretty sure there's a correct answer hidden in there somewhere. |
|
#46
|
|||
|
|||
|
I'll agree with Achernar that the limit cases should behave as he specified, so that provides a guide for ruling out some solutions.
|
|
#47
|
|||
|
|||
|
Quote:
|
|
#48
|
|||
|
|||
|
You see a black guy and a white guy. There are exactly 9,000,000 possibly black guy white guy pairs.
Of those 9 million happy couples: 6,840,000 are an honest white guy and an honest black guy 1,710,000 are an honest white guy and a dishonest black guy 360,000 are a dishonest white guy and an honest black guy 90,000 are a dishonest white guy and a dishonest black guy I believe we are assuming that we have one honest man and one dishonest man, so we can throw out 6,930,000 cases of honest/honest and dishonest/dishonest. That leaves 2.07 million cases of which about 82.6% contain a dishonest black man. Chase the black dude. |
|
#49
|
|||
|
|||
|
Lance Turbo:
You absolutely CANNOT throw out the dishonest/dishonest cases, and get the correct numeric answer; those cases are intrinsic to the problem. However, thus far, it seems likely to me they may not affect your final decision under an algebraic model, except possibly under very extreme discrepancies in population size or prevalence. The question looks to be even trickier under discrete math model (criminals and citizens must come in integer units). Quote:
I do understand what you mean, of course, but Probability is always a matter of inexact knowledge. I've been trying to prove to myself whether the 'Probability 101' model is the best possible approximation, the conditions where it is weakest (if any), and whether it either over-assumes or under-uses all available data. I had assumed that was the primary thrust of the thread, since the resolution to your OP has already been given. However, re-reading the thread, it's clear that not everyone is debating the same issues. I apologize if my focus has caused confusion (apparently it's confused me at least once!), and I'll concede there's more than a small measure of 'Devil's Advocate' in it (I was taught that critical analysis is essential) I don't do it to annoy or mislead - it's actually a fair amount of work! To make up for that, here's the derivation of... The "Probability 101" answer: We have T<w> white candies and T<b> black candies. Some of each are milk chocolate (M) inside and some are dark chocolate (D). For a randomly selected candy of color c, M<c>:D<c> = odds that it has milk chocolate. The probability P<c> = M<c>/T<c>. The number of milk chocolates in each color M<c> = P<c>*T<c> HOWEVER, since M<c>=P<c>*T<c>, M<w> can be greater than M<b> even if P<b> is greater P<w>, if and only if T<w>/T<b> is greater than P<b>/P<w> [i.e. more crimes can be committed by the less crinimal group, if it is large enough] This is the resolution of the apparent paradox in the OP. (this is a 3-variable problem. You must know P<w>, P<b> and T<b>/T<w>) Independence is not a trivial issue in problems like these. The "Monty Hall paradox" hinges on the issue of whether seemingly independent consecutive options are genuinely independent, and therefore whether they have equal probability. If I draw a white candy and then a black candy from the bowl, they are independent events. Neither selection affects the other. The odds of each candy being milk chocolate are given by its respective ratio M<c>:D<c> in the bowl. P<c> can also be used. If I draw two candies together, and then return them to the bowl, until I get a black and a white together, the two candies are still independent draws and the chances of each color being milk chocolate are still given by its respective P<c>. HOWEVER, if I do the above until I have a black-white pair AND exactly one milk chocolate between them, the probability of the two colors being milk chocolate are no longer independent. in this case: P<wb, md> = [P<w>*(1-P<b>)] + [P<b>*(1-P<w>)] HOWEVER, this does NOT represent the crime case correctly. One of the parties commited the crime and must be a criminal, BUT the other party can be either a 'criminal' or 'honest' [i.e. his 'criminality' is immaterial] We can no longer use the marbles or candies that elementary probability texts so adore. P = [P<w>*(1)] + [P<b>*(1)] - [P<b>*P<w>] because [P<w>*(1)] + [P<b>*(1)] double-counts the instances where both men are criminals (once in P<w> and once in P<b>) This tells us the probability of the situation, but it does not yet tell us exactly how to apportion the chances of guilt between the two suspect. Since the equation (and especially the third term) is symmetric with respect to P<w> and P<b>, we might think: P<white guilty> = P<w> - [P<b>*P<w>]/2 P<black guilty> = P<b> - [P<b>*P<w>]/2 Note that the chances of black and white guilt are *not* independent. I'm not 100% certain that this is the best possible answer, but it's definitely as far as you'd get in Probability 101. There are several potential issues that this simple derivation does not address [such as the surprising 'fractional black criminal' in my last post]. When you get away from simple models (e.g. by moving from continuous to discrete mathematics) interesting results often fall out of the cracks. |
|
#50
|
|||
|
|||
|
Lance Turbo, I believe your analysis of counting possible pairs is correct but I do not think you should leave out bad-bad pairs so I would redo it like this:
You have 1,710,000 + 90,000 = 1,800,000 cases with a bad black guy. You have 360,000 + 90,000 = 450,000 cases with a bad white guy Therefore the probabilities are exactly 80% and 20% respectively. I believe this is the correct solution until someone can point out why it is wrong (and I am sure someone will come along shortly and do just that). |
![]() |
| Bookmarks |
| Thread Tools | |
| Display Modes | |
|
|