Boys and Girls, there's lies, damn lies, and statistics. From the Monty Hall Problem olumn

Okay so in a discussion with a friend both Monty Hall problem and the two siblings with a specified gender of one came up.

Specifically discussed in: http://www.straightdope.com/columns/read/916/on-lets-make-a-deal-you-pick-door-1-monty-opens-door-2-no-prize-do-you-stay-with-door-1-or-switch-to-3
I’m not sure about the Monty Hall problem, but I’ve done Science™ on this claim:

Further explained as:

Basically the logic here is there is 3 sets: M-F, F-F, and F-M. therefore each has about 1/3 chance of happening. Two sets opposite sex, therefore opposite sex is 2/3.

This just doesn’t hold up though.F-M, and M-F are same set, just rearranged. The logical fallacy here is copying a set, flipping it around and saying it’s a different, but possible set.

The right answer is 1/2 or 50%.

To prove it I spent all of 10 minutes writing this JavaScript. I subsisted 1 and 0 for genders.

It first randomly decides which comes first the known gender or the unknown. Then it assigns the gender to the unknown kid randomly. then adds one to either the M-M, F-m or M-F set depending on results.



<html>
<script>
var looper = 1;
var limiter=10000;
var boys = 0;
var girl1 = 0;
var girl2 = 0;

while(looper<=limiter) {
	var tester = Math.round(Math.random());

	if (tester==1) {
		var kid1 = 1;
		var kid2 = Math.round(Math.random());
	}

	else {
		var kid1 = Math.round(Math.random());
		var kid2 = 1;
	}

	if ((kid1==1) && (kid2==1)) {
		boys++;
	}

	if ((kid1==0) && (kid2==1)) {
		girl1++;
	}

	if ((kid1==1) && (kid2==0)) {
		girl2++;
	}

	looper++;
}
var girltotal=girl1+girl2;
var percentg=Math.round(girltotal/limiter*100);
var percentb=Math.round(boys/limiter*100);

alert("boy boy is "+boys+". girl boy is "+girl1+". boy girl is "+girl2+". total percents are "+percentb+"% same sex and "+percentg+"% 

opposite sex.");
</script>
</html>


The results:
~25% M-F
~25% F-M
~50% M-M.

Or 50% opposite, and 50%.same. Sometimes common sense is right.

That’s the part that’s wrong, right there. They ain’t “same set, just rearranged”. {Elder child male, younger child female} is not a rearrangement of {elder child female, younger child male}.

That part isn’t necessarily relevant (Or at least it feels like it doesn’t belong): We currently have 4 options to begin with:

MM
MF
FM
FF

GIVEN that one gender is guaranteed to show up at least once (We’ll say female), we’re left with the following:
MF
FM
FF

Herein lies the 2/3 vs 1/2 problem: One is saying P(A n B) (where A is only 1 female, B is at least one guaranteed female). P(A n B) is 2/3s, which is true.

The other view, the 1/2 result, is P (A|B) (Probability of A, given that B has already occurred, using same definitions of A and B).

My thought on this one is to go with the 50% answer: You KNOW that 1 of the 2 children is female: The other child is independent of the first. I chalk the difference in answers on this one to the question being worded badly, but it could be my understanding that is faulty.

What the others have said, but perhaps stated in a different way:

Did you agree that the four original sets were MF, MM, FM, and FF? That is, do you agree that in a family with two kids, each of those sets has an equal probability of happening?

If yes, then you’re done. There’s no reason to change your mind after eliminating one set.

If no, then write a java program to flip two coins a thousand times. You’ll get one head and one tail 1/2 the time, two heads 1/4 of the time, and two tails 1/4 of the time. Mixed results are twice as frequent as either set of paired results.

I think the problem is that you don’t know WHICH of the 2 children are female, so you can’t make the two children independent of one another. The statement “one of the two is female” cannot be evaluated with only one child. Therefore both A and B can only be evaluated after the probabilities of each child’s gender is established.

I’m not sure if I’m explaining this right, so I’ll try the following:
To conduct the experiment represented by the javascript in the real world, you could not arbitrarily declare the gender of one child ahead of time, and then just add in a second random gender, random age child, which is what Tao is doing. You’d have to pick groups of two children, and then filter out the groups that did not contain a certain gender. To express (the male version) in javascript, I offer the following to paste into Firefox’s or Chrome’s URL bar:

javascript:
var looper = 1;
var limiter=10000;
var boys = 0;
var girl1 = 0;
var girl2 = 0;

while(looper<=limiter) {
    var kid2 = Math.round(Math.random());
    var kid1 = Math.round(Math.random());

    if ((kid1==1) || (kid2==1)) { /*removing 0,0 results*/

        looper++;   /*not counting 0,0 results for percentage*/

        if ((kid1==1) && (kid2==1)) {
            boys++;
        }

        if ((kid1==0) && (kid2==1)) {
            girl1++;
        }

        if ((kid1==1) && (kid2==0)) {
            girl2++;
        }
    }

}
var girltotal=girl1+girl2;
var percentg=Math.round(girltotal/limiter*100);
var percentb=Math.round(boys/limiter*100);

alert("Out of "+limiter+" tries, boy boy is "+boys+". girl boy 
is "+girl1+". boy girl is "+girl2+". total percents are "
+percentb+"% same sex and "+percentg+"% opposite sex.");

Pretty much what everyone else said, but more specifically, this quote right here is the issue. What you’re doing is answering a different question than the one in Cecil’s column.

The difference between the two questions is that, in the one you’re illustrating, the two children are differentiated. In this case they’re differentiated not by age, but by one child being identified by gender and the other one not. As an example:

  1. You’re at a party with 100 couples, each of whom has two children. You ask each couple: “Choose one of your children, but don’t tell me which one. Now: is the child you chose a girl?” Of those couples that say, “yes,” how many have a son?

  2. You’re at a party with 100 couples, each of whom has two children. You ask each couple: “Do you have any daughters?” Of those couples that say, “yes,” how many have a son?

These two questions are different. Cecil asked the second, you answered the first. It’s not surprising you have different answers.

BigT I see what your trying to do but you’re considering more then 10,000 sets. The 0, 0 sets are part of the pool even though they’re eliminated out of hand. Yet you’re running them through the statistics. You’re dividing the results of ~13,333 set by 10,000. Consider this version:


javascript:
var looper = 1;
var limiter=10000;
var boys = 0;
var girl1 = 0;
var girl2 = 0;
var setcount = 0;

while(looper<=limiter) {
    var kid2 = Math.round(Math.random());
    var kid1 = Math.round(Math.random());

    if ((kid1==1) || (kid2==1)) { /*removing 0,0 results*/

        looper++;   /*not counting 0,0 results for percentage*/

        if ((kid1==1) && (kid2==1)) {
            boys++;
        }

        if ((kid1==0) && (kid2==1)) {
            girl1++;
        }

        if ((kid1==1) && (kid2==0)) {
            girl2++;
        }
    }
setcount++;
}
var girltotal=girl1+girl2;
var possiblesets=girl1+girl2+boys;
var unpossibles=setcount-possiblesets;
var percentg=Math.round(girltotal/looper*100);
var percentb=Math.round(boys/looper*100);
alert("Out of "+setcount+" tries, boy boy is "+boys+". girl boy is "+girl1+". boy girl is "+girl2+". sets that were put into the statistics even though they were ruled out of hand is "+unpossibles+".");  


Girl-Girl is ruled out of hand. That shouldn’t be considered at all. Yet the program considers it ~3333 times that way.

Considered another way. We have poles now.

Anyone object to this pole question? “dopers having two kids, with one of them a boy, do you have:
1: two boys
2: boy-girl
3: girl-boy”

That’s a great poll question. Or, if you prefer to think of it as not mattering, just ask if they have:

  1. 2 boys
  2. 1 boy and 1 girl

Oh I agree there’s only 2 sets really but, that’s the source of the debate. the 2/3 side feels girl-boy, and boy-girl are different sets, with an equal probability each to boy-boy.

I’d therefore like to keep them separate in the poll. It’s my belief that if boy-girl, girl-boy are the same set then the artificial divide will cut them in half resulting in 1/4 boy-girl, 1/4 girl-boy, and 1/2 boy-boy. The javascript in OP demonstrates this, but it’s disputed.

So I’d like to set up an undisputed experiment.

Wait, you think that, of all families with two children, the same proportion of families has one of each sex as has two boys?

So let’s explore that. Ignore Cecil’s question for a minute and consider *all *families with two children. If the number of families with two boys equals the number of families with a boy and a girl, doesn’t that mean that it also equals the number of families with two girls?

And so of all the families with two children, 1/3 have two boys, 1/3 have two girls, and 1/3 have a boy and a girl?

Do you agree with that? If not, why not? If so, can you set up a Javascript to show whay that is so?

No because in that instance you have two sets, opposite sex pair, and same sex pair, 50/50. same sex can further be divided down to boy-boy, and girl-girl. Meaning opposite gender is 50%, boy-boy is 25% and girl-girl is 25%.

Where as in Cecil’s question the gender one of the children is specified, meaning the only variable is the other child’s gender.

Consider this thought experiment.

You take two quarters, and put one on your desk heads up, then flip the other one. What are the possible outcomes? Heads-heads, and heads-tails, 50/50.

Now say you put quarter heads up, flip the other one then shuffle them around randomly. Does that change the likelihood of getting two heads?

The whole point of Cecil’s question is that there is no specific child whose gender is specified.

Your thought experiment is answering a different question than the one Cecil asked. Since it’s a different question, it’s not surprising the answers are different.

Continuing on with your poll idea:

Suppose there are 100 Doper families with two total children. I assume you agree that, of the 100, the breakdown will be roughly into four groups like so:

Group A (25): Eldest child BOY, youngest child BOY
Group B (25): Eldest child BOY, youngest child GIRL
Group C (25): Eldest child GIRL, youngest child BOY
Group D (25): Eldest child GIRL, youngest child GIRL

So let’s ask your poll question: "Of the dopers having two kids, with at least one of them a boy, do you have:
1: two boys
2: boy-girl
3: girl-boy

So what, exactly, would you predict the numerical breakdown of the answers to be, and which Doper families from groups A-B-C-D would be included in which poll answer 1-2-3?

The real problem with this one is that there are very few real-world situations where you’d know that someone has at least one daughter, without also knowing which one it is. I can come up with an example, but it’s pretty contrived: You basically need something like “Everyone who has at least one daughter, raise your hand”. On the other hand, it’s really easy to find real-world examples where you know something about a specific child (for instance, you see one of the children).

Here’s a good explanation:

Imagine flipping two pennies. I claim there’s four possible combinations:

Heads:Heads
Tails:Tails
Head:Tails
Tails:Heads

You claim that it doesn’t make a difference which penny is heads or tails, what you have is one penny is heads and one penny is tails. Thus, it should be

Heads:Heads
Tails:Tails
Heads:Tails (i.e. one penny is heads and one is tails).

Okay, I say, lets substitute a quarter for one of the pennies. Now we have:

Quarter Heads: Penny Heads
Quarter Tails: Penny Tails
Quarter Heads: Penny Tails
Quarter Tails: Penny Heads

Do you agree that the last two are two separate instances? For example, I bet you that the quarter will come up heads and the penny will come up tails. If the penny comes up tails and the quarter comes up heads, would you agree that I lost? Would you agree that my bet has only a one out of four chance of succeeding?

The only difference between the two is the type of coin I am using. Certainly, the fact that one of the coins is a big shiny silver colored coin instead of a small dark brass colored coin shouldn’t make any difference in the odds.

It’s the same chances. A quarter has two faces, just like a kid has two gender possibilities.

How is it meaningfully different in order to change the outcome?

1:~50%
2:~25%
3:~25%

Because you’re looking at it wrong. There’s 3 possible sets in this problem, but two ways of getting boy-boy. Here’s why. One is a boy, but it could be either one, the other one can be either a boy or a girl.

Defining sB as referring to the specified boy, rB as a random boy and G as a random girl we have:

1: sB-rB 25%
2: rB-sB 25%
3: sB-G 25%
4: G-sB 25%

1, and 2 are the same result so combine to make 50%

I’ve put together some R code to illustrate the difference between the two problems.

First, we have the case where we are told that at least one of the two children is a girl:



N <- 100000
trials <- cbind(runif(N), runif(N)) < .5
castout <- length(which(trials[,1] == FALSE & trials[,2] == FALSE))
fails <- length(which(trials[,1] == TRUE & trials[,2] == TRUE))
succs <- N - (castout + fails)
p <- succs / (N - castout)


Second, we have the case where we are told that the first child is a girl:



N <- 100000
trials <- cbind(runif(N), runif(N)) < .5
castout <- length(which(trials[,1] == FALSE))
fails <- length(which(trials[,1] == TRUE & trials[,2] == TRUE))
succs <- N - (castout + fails)
p <- succs / (N - castout)


In both cases, we throw out the results that don’t meet our criteria, but as you can see, what we cast out is different in both cases. In fact, the results that we throw out in the second case all meet the criteria for being thrown out in the first case, but the reverse is not true. Therefore, the final answers will be different. As other posters have explained, in the first case p is roughly 2/3, and in the second it’s roughly 1/2.

A man tells me he has two coins in his pocket. Either coin might be a quarter or a penny. He produces one coin, and it happens to be a quarter. He bets me that I can’t tell him what the other coin is. So according to the 2/3 supporters, I should always bet on the penny because I have a two-in-three chance of winning the bet, whereas a quarter has only a one-in-three chance of being the right choice?

It’s meaningfully different because the language restricts which subset of families you choose as the total population. In one case you restrict the population to all families who have *at least one *boy, in the other case you restrict the population to all families where a particular child is a boy. Those are *different *questions, *different *populations, and *different *answers.

One more time. *So what, exactly, would you predict the numerical breakdown of the answers to be, and which Doper families from groups A-B-C-D would be included in which poll answer 1-2-3? *

One more time: The whole point of Cecil’s question is that there is no specific child whose gender is specified.

No, because, one more time: that’s a different problem.

Once you specify the gender of a *particular *child, or the value of a *particular *coin, then the chance of the other child or coin being of one type or another is 50%.

But that’s not the question in Cecil’s column.

Let me restate your hypothetical:

A man tells me he has two coins in his pocket. Either coin might be a quarter or a penny (with a 50-50 chance, I’m assuming you meant). You ask the man if he has at least one quarter, and he says he does. Now: what are the chances he has exactly two quarters?

If you want three sets, I agree it’s still an OK poll. It’s not going to change how many people out there with exactly two kids, one a boy, will have the other child be a girl versus a boy.

Either way I predict ‘two boys’ will be about 1/3 of the respondents.

Well, this wouldn’t be fair because in the real world people don’t get to choose to have one boy, then leave the other child’s sex up to chance. Or leave the first child’s sex to chance, then force the second child to be a boy. If that were the case, then yes, you’d get 50/50.