Lets discuss Bayesian Statistics

Kimmy_Gibbler · April 18, 2012, 3:23am

Considering that much of classical probability theory was motivated by aristocratic Continental intellectuals’ taste for wagering (see martingale), it’s not so preposterous.

monstro · April 18, 2012, 3:30am

I’m not sure I understand what you’re saying.

Say you know that the crime statistics show that people who have a surname beginning with a “Z” are disproportionately convicted as murderers.

You are presented with a homicide case in which the suspect’s last name starts with a “Z”. The guy says he’s innocent.

You are asked to weigh in.

Do you say, “I don’t have enough information to weigh in”?

Or do you say, “Well, I know I don’t have enough information to give an intelligent opinion, but I do know that people with “Z” surnames are over-represented in the crime stats. I’m not saying that makes him guilty, but that means something about something, at least in theory, right?”

Your answer to this question will help me understand what exactly you’re arguing.

(If it helps, you can replace “Z” surnames with a certain gender. Or race. Or socioeconomic status. Whatever floats your boat.)

mister_nyx · April 18, 2012, 3:33am

Can I apologize further and hijack further? Because this reminds me of a question I wondered to myself about a few years back. It refers to human population genetics and a gene (or more correctly, an allele) that is more prevalent in some populations than others, according to geography. There’s no particular reason to think that it relates to any other human characteristic. But it seems to vaguely relate to a characteristic that correlates to the same geographic region.

Some researcher discovered this just by happening to notice a map of the gene’s prevalence. If it had been found by some statistical analysis of 500 loci that vary in the human genome, and happened to line up with some vague social fact, I would think, well, that it was a case of a spurious correlation. If we accept a 0.95 level of statistical significance, than we should expect to find 1 in 20 spurious correlations that meet statistical significance. (Right?)

But it wasn’t, it was just a single event, when one researcher noticed this genetic trait had a geographical component that matched with a social fact. Should that make me more confident that the two might be related?

ultrafilter · April 18, 2012, 3:48am

At some point, you need to decide whether you believe the coin is fair, or else there’s no point in doing statistics at all. How are you going to make that decision?

Absolutely. It may comfort you to know that I’m about a week away from finishing up the coursework for my PhD in statistics, and that absolutely no one in my department would even bat an eye at the question. So, how much would you bet?

You can very easily compute the probability of getting at least 75 heads in 100 flips of a fair coin. Again, you need to make a decision. What’s the basis for that decision?

I was thinking more of the typical method of (subjective) probability elicitation, which is
all about how much you’d bet based on your beliefs regarding the probability of an event. Unfortunately there doesn’t seem to be a good and easily Googleable reference on the subject.

(Martingales date to right around World War II and are a bit late to be really motivated by the gambles of aristocrats.)

I don’t understand the point of making the statement. Yeah, it’s true, but so what?

steronz · April 18, 2012, 3:50am

Frictionless surfaces can be very useful. Imagine a police officer needs to determine if a stone slab that slid down a hill and crushed someone took more or less than 30 seconds, because that’s the window of time the suspect claims he had his back turned. He’s no physicist, but he went to college, so he does a quick napkin equation and determines that sans friction, the block would take 45 seconds. He makes an arrest, and let’s an expert do a more precise calculation for a trial.

Incorrectly applying bayesian statistics, if I understand the OP correctly, is not only not useful, it can be actively detrimental. So why mention it?

Carmady · April 18, 2012, 7:57am

I think you left out a 0. Once every 3,500,000 attempts. And testing coin fairness is a 2 sided test, so what you should consider is that once every 1,750,000 attempts you would get a result that bad.

Left_Hand_of_Dorkness · April 18, 2012, 10:51am

I’ll ask the question again: what would be sufficient basis for suspecting a wonky coin (or some other wonkiness in results, e.g., incorrect reporting)? I’d start suspecting wonkiness once I got a 1-in-a-thousand result, probably, and become more convinced if I got a 1-in-10,000 result, and by the 1-in-100,000 result, you’d need to convince me that the coin was fair, not vice-versa.

Can anyone figure out the odds of rolling a 7 on six-sided dice 80 out of 120 times, by the way?

Acsenray · April 18, 2012, 11:03am

I’m not going to follow this hijack, because I’m not interested in it. Would you care to address my actual question?

Left_Hand_of_Dorkness · April 18, 2012, 12:00pm

Acsenray:

What about the more general issue? I’m not interested in arguing about what Bricker did or didn’t say, but I think there’s something here that’s more broadly relevant to public discourse.

Isn’t there some basic logical fallacy regarding taking generalized statistics and using them to determine the facts of a particular case?

For example. I have flipped a quarter 100 times and 75 times it has come up heads and 25 times it has come up tails. So then I say, “According to the statistics, there is a 75 percent chance that the next flip will be heads.”

Now, we know that this is wrong, because we know the probabilities regarding a flipped coin. But what if we didn’t? Isn’t it still some kind of fallacy to take a generalized statistic and apply it to a specific case when we have not shown that the statistic grows out of an inherent characteristic of the coin?

I still don’t think that talking about the fairness of the coin is a hijack–indeed, it’s the only way to relate it to the topic of conversation.

If 55/100 coin flips come up heads, I shouldn’t predict the next flip will be heads, because the chance of an unfair coin weighted 55% is far less than the chance of getting this result with an unfair coin.

If 100/100 flips come up heads, I should predict the next flip will be heads, because the chance of a two-headed coin is much greater than the chance of getting this result with a fair coin.

Same thing in the real world, I think. If you’ve seen events unfold in a way that suggests normal statistical spreads, then you should predict results anywhere within the normal statistical range. But if the results are really wonky, then you need to start looking for reasons why they’re outside of the norm.

Bricker · April 18, 2012, 12:10pm

monstro:

I’m not sure I understand what you’re saying.

Say you know that the crime statistics show that people who have a surname beginning with a “Z” are disproportionately convicted as murderers.

You are presented with a homicide case in which the suspect’s last name starts with a “Z”. The guy says he’s innocent.

You are asked to weigh in.

Do you say, “I don’t have enough information to weigh in”?

Or do you say, “Well, I know I don’t have enough information to give an intelligent opinion, but I do know that people with “Z” surnames are over-represented in the crime stats. I’m not saying that makes him guilty, but that means something about something, at least in theory, right?”

Your answer to this question will help me understand what exactly you’re arguing.

(If it helps, you can replace “Z” surnames with a certain gender. Or race. Or socioeconomic status. Whatever floats your boat.)

In that sequence, I would say, “I don’t have enough information to weigh in.”

BUT: if someone else at that discussion said, “I do know that people with “Z” surnames are over-represented in the crime stats!” then I would say, “Yes, that’s a true statement. It’s not useful because here we have actual facts to use, but your statement is absolutely true. Just not useful.”

Barrett_Bonden · April 18, 2012, 12:14pm

Going back to what Evil Economist said:

I think the problem with introducing an accurate but inapplicable statement into a discussion is that by bringing it up in the context of the discussion, you are causing people to presume that you believe it is relevant to the discussion.

Bricker · April 18, 2012, 12:16pm

I don’t know. I can’t picture myself making such a statement out of the blue. What I can easily imagine – because it happened – is for someone else to make the statement, other posters to call it false, and me to weigh in saying that it’s actually true, though of course only in that highly theoretical situation in which we know nothing else.

Bricker · April 18, 2012, 12:21pm

But I didn’t. So far as I recall, it was Huerta88 that first brought it up, other posters that declared it was false, and I then said, with increasing frustration, that it was obviously true, keeping in mind that it remained true only in the absence of any other data, which I also kept repeating. And my frustration was born and fed from the fact that the folks on the other side would not say, “Yes, technically true, useless in real life.” They insisted instead on, “No, utter garbage,” or as monstro says here, wrong to ever mention. This baffles me.

And I’m not slamming monstro here; I am just baffled as to why we cannot acknowledge the technical truth of the proposition… as you have, I might point out.

Acsenray · April 18, 2012, 1:19pm

The point is that the observer doesn’t know what the actual normal statistical range is and he or she is trying to derive it from whatever statistics are available. That’s where the logical fallacy comes in that I’m trying to identify. If he knew that the actual probability was 50-50, then there wouldn’t be a problem, would there?

Barrett_Bonden · April 18, 2012, 1:34pm

I believe that most of the folks you mention (including you) are arguing in good faith and I think context is particularly crucial to this argument. I think the source of the disagreement about “true” versus “not true” is just that most people assume that a statement made in a discussion is intended to be pertinent to that discussion. So you were correct that the statement was “true” in a vacuum; but others were correct that vacuum conditions did not apply in the debate, so the statement was just not applicable.

You’re probably more sensitive to this than I am, but the reason that people are critical of the use of these sorts of statistical analyses in discussions about race and crime is that statistics are frequently misapplied to race and crime in insulting and harmful ways. If someone insisted that a theoretical statement is “garbage,” I am positive that she/he meant that it is valueless in the context of that specific discussion. Or not merely valueless, but actually harmful, since it can be used wrongly to lead people to inaccurate and damaging conclusions. I think that is the source of the strongly worded disagreements that you’re citing.

septimus · April 18, 2012, 1:50pm

I didn’t read the linked-to thread, but it’s certainly quite easy to quote statistics completely out of context, while not actually lying.

I’m curious how often this is done, in courts of law, by lawyers or expert witnesses.

you_with_the_face · April 18, 2012, 2:37pm

I remember very well what Huerta88’s position was. He argued that because white-on-black rape is “vanishingly rare” according to some misinterpreted FBI crime statistics, then this suggested the Duke accuser was lying. You defended this as being a sound, reasonable, and a well-argued point-of-view. For millions of pages you defended this idea, despite being told this was wrong by people better acquainted with statistics than you.

Do you still stand by this position?

Furthermore, do you not see the difference between this argument and saying that if 8% of all rape allegations are false and we know nothing else about a particular rape allegation, there’s an 8% chance it’s false? Notice that these are two completely different positions. Taking crime stats for on an arbitrarily-selected variable like race and extrapolating that to an accuser’s credibility is more egregiously wrong than looking at the frequency of false allegations and making inferences about the truthfulness of a random rape accusation.

steronz · April 18, 2012, 3:39pm

I still can’t find the parts of the original thread that are relevant, but I found the resulting pit thread, in which Bricker says, roughly,

“I felt that Huerta88, since he was presenting good cites and making a cogent argument, was a more convincing debater. I was simply playing the role of an outside observer and announced that I felt that one side was ‘winning.’ you with the face, on the other hand, was simply saying that the stats **Huerta88 **had cited were irrelevant, without posting her own cites showing how or why they were irrelevant.”

He then went on to admit that other posters eventually came along and convinced him that Huerta88’s statistics were indeed irrelevant, and that’s where I stopped reading the pit thread. So it appears that we have a case here of Bricker jumping on the wrong bandwagon. Once he realized he was on the wrong bandwagon, he claimed he had jumped on it for the right reasons, instead of (or maybe in addition to) admitting that he was unqualified to judge which bandwagon he should have jumped onto in the first place.

It’s worth pointing out that **Huerta88 **was ultimately wrong about his use of statistics, and you with the face was ultimately correct. I think **Bricker **is walking a fine line, perhaps 1 weasel width wide, by saying he was merely defending Huerta88’s statistical accuracy, and not Huerta88’s incorrect use of said statistics, or defending his debating ability and not his actual position.

As an outside observer, I think it would have been better if Bricker had jumped off the bandwagon as soon as he realized it was the wrong one, and then just admitted that he was on the wrong bandwagon instead of defending his rationale for getting on the wrong bandwagon in the first place.

Bricker · April 18, 2012, 3:42pm

Barrett_Bonden:

I believe that most of the folks you mention (including you) are arguing in good faith and I think context is particularly crucial to this argument. I think the source of the disagreement about “true” versus “not true” is just that most people assume that a statement made in a discussion is intended to be pertinent to that discussion. So you were correct that the statement was “true” in a vacuum; but others were correct that vacuum conditions did not apply in the debate, so the statement was just not applicable.

You’re probably more sensitive to this than I am, but the reason that people are critical of the use of these sorts of statistical analyses in discussions about race and crime is that statistics are frequently misapplied to race and crime in insulting and harmful ways. If someone insisted that a theoretical statement is “garbage,” I am positive that she/he meant that it is valueless in the context of that specific discussion. Or not merely valueless, but actually harmful, since it can be used wrongly to lead people to inaccurate and damaging conclusions. I think that is the source of the strongly worded disagreements that you’re citing.

Perhaps, and maybe it was foolish of me to continue to insist on a concession that had zero practical applicability to the man discussion, but at the same time it seemed to me – and still does-- that on a website supposedly devoted to fighting ignorance, we can in fact say what is true and at the same time acknowledge that we’re talking about something true only in a vacuum.

I get the danger in, say, a newspaper article,of making a similar statement. But in the context of The STRAIGHT Dope, I am sorry, but we should be able to handle the distinction.

In my opinion.

Bricker · April 18, 2012, 3:51pm

No.

Well, I still defend the position that as between you and Huerta88, he was calmly offering cites and supporting arguments, and you were simply gainsaying his position, so that I found him a more effective rhetor.

But I don’t stand by the idea that there is any applicability to the Duke accuser. And I was moved from that position by people who offered cites and supporting arguments. And I conceded that back in that thread.

Topic		Replies	Views
Bricker is a disingenous punk. The BBQ Pit	613	20384	August 25, 2006
Question About Bayesian Analysis Great Debates	75	2569	August 14, 2006
Should statistics be used to evaluate the veracity of an alleged victims claims? Great Debates	46	2139	April 19, 2006
Statistics & profiling problem Factual Questions	70	1874	January 9, 2004
Kobe Bryant charged with felony sexual assault Great Debates	129	4013	July 24, 2003

Lets discuss Bayesian Statistics

Related topics