Monty Hall Redux

That’s just it though - you don’t know whether the boy is older or younger than his sibling and that provides different information than if you do know whether he’s older or younger. Look at ultrafilter’s four cases. They’re are equally likely, correct? If you know the boy is older (or younger), you eliminate two of the possibilities and you’re left with a 50-50 chance as you say. But if you only know one is a boy and not whether he’s older or younger, then you’ve only eliminated one of the possibilities - the girl-girl one - and you’re left with two of your remaining three possibilities having a girl as the remaining sibling, i.e. two-thirds chance.

There is no specific boy being talked about, is the main issue. The setup of the original problem says essentially that there is at least one boy among the two children — not that one particular child (whether identified by age or any other quality) is a boy.

Yes, but as **BrianSK **was alluding to upthread, we always have some form of identifying information. What’s the difference between the following two sets?

  1. The oldest child is a boy and the youngest child is a boy.
  2. The oldest child is a boy and the youngest child is a girl.
  3. The oldest child is a girl and the youngest child is a boy.
  4. The oldest child is a girl and the youngest child is a girl.

  1. The one I know is a boy, the one I don’t know is a boy.
  2. The one I know is a boy, the one I don’t know is a girl.
  3. The one I know is a girl, the one I don’t know is a boy.
  4. The one I know is a girl, the one I don’t know is a girl.

What if you don’t know either of them?


Here’s an even stranger example of how conditional probabilities depend on exactly what information you’re conditioning on. Suppose that we’re looking at families with two children, and we want to know the probability that a family has two boys given that they have at least one boy born on a Tuesday. As is standard, we assume that everything is uniformly distributed over the appropriate category and independent of everything else.

In this case, there are 196 different equally likely outcomes, so it’s not all that much fun to write them all out, but that’s what computers are for. Here’s R code to compute the desired probability:



days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
sexes <- c("Boy", "Girl")
events <- data.frame(expand.grid(sexes, sexes, days, days))
names(events) <- c("Sex1", "Sex2", "Day1", "Day2")
index <- (events[, "Sex1"] == "Boy" & events[, "Day1"] == "Tuesday") | events[, "Sex2"] == "Boy" & events[, "Day2"] == "Tuesday"
events.good <- events[index, ]
index2 <- events.good[, "Sex1"] == "Boy" & events.good[, "Sex2"] == "Boy"
p <- nrow(events.good[index2, ]) / nrow(events.good)


p works out to be 13/27, or slightly more than 0.48.

Then call it “the one I know the gender of…”

I happen to know a family with two children, and I have verified that the two children are not both girls.

So which of the two children do you know the gender of?

Since that’s just a convoluted way of saying that at least one is a boy, I know the gender of the one that is a boy.

You don’t necessarily have identifying information, and the tricky bit of the problem is that it isn’t giving you any, despite appearances.

No important difference. The problems are that (1) there’s no promise that the children are distinguishable by any attribute other than sex, and even then it would only distinguish them if they happen to be different, and (2) even if there is some such attribute, you haven’t been given any information tying “the one with attribute X” with his or her sex.

In particular, there is no “child that you know”. The language of the original problem that we’re talking about (if it’s the one I think) is: You have been told this family has a daughter. It’s tempting to imagine there’s one particular child — the oldest one, the blond one, the one whose name begins with a vowel, something — being revealed as a girl. But in fact the sentence only means, “the number of girls is at least 1”.

But you have no way of differentiating the two children, so there is no “that one” to refer to.

It’s sometimes easier to think in terms of absolute sets rather than probablities, so try this one on for size:

There are exactly 1000 families with two children in my hometown. I have indeed verified that in exactly 750 of them, the two children are not both girls. Of those 750 families, how many do you think contain two boys?

Right, and if the daughter, whose gender you have revealed, has been selected randomly, then birth order or hair color or whatever else plays no part in the equation. Thus, I break it down as follows.

  1. The child whose gender I know is a girl, the other child is a boy.
  2. The child whose gender I know is a girl, the other child is a girl.
  3. The child whose gender I know is a boy, the other child is a boy.
  4. The child whose gender I know is a boy, the other child is a girl.

And the probability of the other child being a boy is 0.5

Only if the child whose gender you reveal was NOT chosen randomly can you come up with a probability of 1/3 for the second child being a boy.

The wiki articleactually has an explanation of the assumptions necessary for conditional probability to apply to this problem.

Yes, you can break it down that way, but the four possibilities don’t have equal probabilities. Say the child whose gender you know is a boy, then you’ve eliminated 1 and 2 and you’re left with 3 and 4. But 3 and 4 are not equally likely, i.e. 50-50. 4 is twice as likely as 3, because there are two ways for 4 to be true (boy oldest, then girl, or girl oldest, then boy), but only one way for 3 to be true.

I haven’t revealed the gender of any child. All I’ve told you is that they’re not both boys. This is a subtle distinction, but it’s at the heart of the problem.

Try this experiment on for size: Take two quarters and flip them. Write down the number of heads. After you repeat this about 100 times, look at the ratio of times you wrote “one” to times you wrote “two”. It will be pretty damn close to 2 with very high probability.

If you don’t want to do it by hand, here’s the R code for an equivalent experiment:



M <- matrix(runif(2 * 100) < 0.5, ncol = 2)
N <- apply(M, 1, sum)
ratio <- length(N[N == 1]) / length(N[N == 2])


I ran this a thousand times, and the average ratio was 2.07. Only 10% of the ratios were less than 1.5. If the true value were one, that basically wouldn’t happen.

You’re still not framing the question in a way that would lead one to conclude 1/3 probability in the boy/girl case. As per the wiki linked to previously, the question should be stated like this:

So, to get your 1/3 probability you have to remove some of the randomization, which you have never done in your single family boy/girl problem. You did do in your quarter tossing experiment and your village of 1000 families where you narrowed it down to 750 who had at least one boy. However, even in the 1000 families problem, if you take the probability of a family having 2 boys only among the 750 families who have at least one boy, it’s going to be 1/2.

Your conclusion here is that out of 1000 families with two children, 250 have two girls, 375 have two boys, and 375 have a boy and a girl. I’m not sure how you could have concluded that, particularly given the obvious asymmetry.

You don’t think that, among all two-child families, 1/3 have two boys, 1/3 have two girls, and 1/3 have a boy and a girl, do you?