The boy/girl probability question (for the 50th time. I know. I'm sorry.)

Thudlow_Boink · October 19, 2006, 3:34pm

I think the OP has a point.

The way Paulos has worded the question, I am compelled to wonder, “Okay, how is it known that the family has at least one daughter?” And the most natural assumption, the way the situation is described, is that you’ve met her. That’s how you know her name. So, you know she’s a girl, and you know she has a sibling, and the question is whether that sibling is a girl or a boy, and the probability of each.

Malacandra · October 19, 2006, 3:41pm

I know!

I think there are two possible questions here, and they’re easily confused. If we say “Line up all the two-child families and bring me one at random. If the family includes a girl, we’ll continue with the question” we get a situation where it is 2/3 likely that the other child is a boy. But if we say “Line up all the two-child families. Pick a girl at random out of the line-up and bring me the family” we have a different situation - because half of the girl population is in GG families, and so it is now as likely that we will pick a GG family as a mixed-sex family.

Which might be where 'face and I have been talking past each other!

don_t_ask · October 19, 2006, 3:46pm

No. If you pick a girl and get her family you eliminate all the BB families.

Malacandra · October 19, 2006, 3:52pm

Yes, but be careful!

We have the following four families to choose from:

[ul]Andy and Bob
[li]Charlie and Alice[/li][li]Brenda and David[/li][li]Carrie and Delia[/li][/ul]

There’s your binomial distribution. Pick me a girl at random. True, you have eliminated the Andy/Bob family. But aren’t you going to hit the Carrie/Delia family as often as the other two put together?

you_with_the_face · October 19, 2006, 3:56pm

No, unless I’m misunderstanding your confusion.

Andy and Bob - 25%
Charlie and Alice - 25%
Brenda and David - 25%
Carrie and Delia - 25%

Take out Andy and Bob and the numbers become

Charlie and Alice - 33%
Brenda and David - 33%
Carrie and Delia - 33%

There are twice as many girl-boy combinations as girl-girl, so the chances of getting girl-boy is 2/3.

KidScruffy · October 19, 2006, 11:42pm

I think Malacandra’s second example had it where a girl was picked at random out of the lineup of all family-of-four children, and the question then is asked if she has a brother. So:

Alice - yes, has a brother
Brenda - yes, has a brother
Carrie - no, has no brother
Delia - no, has no brother

Of course, I’m no expert. I subscribe to the “Count the possible outcomes” school of probability. Is it going to rain? Well, either it will or it won’t. 50-50 chance!

ultrafilter · October 20, 2006, 12:56am

I ran a quick simulation to see how the distribution of boys and girls falls out:



#include <iostream>
#include <tchar.h>
#include <time.h>

const int boy = 0;
const int girl = 1;

const int familyCount = 1048576;

int distribution[ 2 ][ 2 ];

int main()
{
	distribution[ girl ][ girl ] = 0;
	distribution[ girl ][ boy ] = 0;
	distribution[ boy ][ girl ] = 0;
	distribution[ boy ][ boy ] = 0;

	srand( time( NULL ) );

	for ( int count = 0; count < familyCount; count++ )
	{
		int firstChildGender = rand() % 2;
		int secondChildGender = rand() %2;

		distribution[ firstChildGender ][ secondChildGender ]++;
	}

	printf( "Two girls: %d
", distribution[ girl ][ girl ] );
	printf( "One of each: %d
", distribution[ girl ][ boy ] + distribution[ boy ][ girl ] );
	printf( "Two boys: %d
", distribution[ boy ][ boy ] );

	return 0;
}

Here’s the output (edited with commas for ease of reading):



Two girls: 262,192
One of each: 524,192
Two boys: 262,192

Note that we’re not dealing with a random sample here; this is the entire population. Pick a family at random. The probability that they have a girl is almost exacly 3/4, and the probability that they have a girl and a boy is almost exactly 1/2, so the conditional probabilty that they have a boy given that they have a girl is 2/3.

David_Simmons · October 20, 2006, 3:32am

Isn’t the problem changed by the fact that the boy/girl birth ratio isn’t 1/1?

ragerdude · October 20, 2006, 5:28am

Well, how about this: I flip two coins. I tell you that the quarter came up heads. What is the probability that the other coin came up heads? The answer is 1/2. (Assume that there is exactly one quarter.)

I agree with ultrafilter’s mathematics, but I don’t think that the question is phrased in such a way that Paulos’s readers will always interpret it that way.

I suppose there might me some “diagonalization”-inspired claim that, no matter how many assumptions are made explicit, the problem can be made even more clear by the addition of another explicit assumption. If you add that one to the list, there still is another, etc.

In any question where there is an ambiguity we either leave the problem unanswered or we append a reasonable assumption that makes the probelm solvable. For example, we both made the reasonable assumptions – unstated – that the sex of each child is independent of the others, and that the probability of each birth is exactly 1/2 male and 1/2 female, an idealized assumption that is not precisely true in the real world. But, it is a reasonable assumption.

Assuming that not both children are named Myrtle, the questions states that a given unique individual (Myrtle) is a girl, and the other child’s gender is to be determined. This probability is 1/2.

Paulos’ version is inferior in many ways, and Thudlow Boink’s point is another way of pointing that out.

The question would have been much more convenient if Paulos had asked the question in such a way that makes sense. Given that there is at least one daughter, Paulos says that “her” name is Myrtle, as if a certain one of the two girls (for GG) had already been chosen. Paulos’s condition

“which is known to have at least one daughter”
is equivalent to

“which is known not to have two sons.”
It means no more or no less. Thus, rephrasing Paulos’ question by substitution, he is asking:

“Consider now some randomly selected family of four which is which is known not to have two sons. Say Myrtle is her name.”

In the interest of Paulos’s goal of fighting mathematical ignorance, I suggest that he write in such a way that is above any hint of ambiguity, especially with the reading public in mind. Things should be made quite precise for them – math is confusing enough already. Even if it can be shown that Paulos’s answer is correct, I feel that I am right in critiquing his phrasing (which was my only real dispute).

The notation (BG, GB, etc. – as birth order) is tailor-made for the question involving the “older child,” so it worksthat way for the second problem. What if we “order” them this way: {Myrtle, non-Myrtle}? Then possibilities {B, B} and {B, G}are eliminated, and {G, B} and {G, G} remain, and thus the probability is one out of these two, or 1/2.

If that’s what Paulos had meant, I wish he had phrased it as well as you just did. The answer here is clearly 1/3.

Malacandra · October 20, 2006, 2:37pm

That’s exactly right, if you’re picking a family first (and giving up in disgust if it’s boy-boy). But if you’re picking a girl out of all the girls in the pool, you are equally likely to get Alice, Brenda, Carrie or Delia. Hence, although there are only half as many girl-girl families as boy-girl, they’ll be selected pro rata twice as often. We’re both correctly answering different questions.

Marginally, but I think we were tactfully ignoring the fact that the ratio is about 1.02/1; it keeps the math simple. (We really need to ignore countries that are sex-selecting by abortion, of course.)

I don’t understand what your “ordering” is with the {B, B}, {B, G} and {G, G} cases. You’re ordering them on something, obviously, but it certainly isn’t their Myrtlitude.

I thought this over far too much on the way home last night. It is misleadingly close to the famous Monty Haul problem, but not quite the same after all. If the condition is “I will select a two-child family at random and then I will tell you either that it contains a boy or a girl, and you must guess the sex of the other child”, then you’re back to a 50-50 guess. Reason: half the time the questioner will hit a single-sex family and has only one sex to tell you about. The other half, he will hit a mixed-sex family and (presumably) flip a coin as to whether to say “boy” or “girl”. That leaves you a 25% chance he hit a BB family and says “boy”; 25% he hit a mixed-sex family and says “boy”; 25% he hits a mixed-sex family and says “girl”; and 25% he hits a GG family and says “girl”. IOW, the likelihood of the other sibling being the opposite gender is exactly 50%.

In that case, I’ll wager a C-note that you’re not going to win $1M or more on the State lottery this weekend. What’s more, I’ll give you 2-1 odds: you don’t win, you owe me only $50. 2-1 odds on a 50-50 bet! How can you lose?

you_with_the_face · October 20, 2006, 3:46pm

Which is why the way the first question in the OP is worded incorrectly. In order to get 2/3 odds, not only does a family have to be randomly selected, they also have to be the subject of the question. Meaning, you can’t ask about Myrtle, as the author did. Once you do that it changes the question into the kind of situation you’ve described above. Myrtle’s sibling can only be either male or female–two possibilities. But there are 4 different possibilities for a family.

Yeah, but that’s not anywhere near the problem posed in the OP. The condition of importance is that the selected famiy has to have at least one girl.

ragerdude · October 23, 2006, 4:18am

I agree with both answers to the different questions.

Now that I think about it, my ordering is imprecise and was only illustrating the point that the birth orders BB, BG. GB, and GG are not way I’m looking at it. Rather, the two equally likely possibilities are: {Known, Boy} or {Known, Girl} where these are (unordered) sets, and Myrtle is the known one. These are the only possibilities, just as in my quarter example above.

That was equivalent to flipping a quarter, having it come up heads, and THEN flipping a penny and not looking at it. We could then say, “There is at least one heads–the one on the quarter. What is the probability that the other coin is tails?” The answer is 1/2.

The situations are analogous. We could also say “There is at least one heads (or girl) – the one on the quarter (or the one that is Myrtle). What is the probability that the other coin (child) is tails (a boy)?”

The only assumption is that there cannot be two quarters or two Myrtles.

Anyway, my theory is that Paulos wrote the problem the 2/3 way, but slipped in the name Myrtle, parenthetically, in an attempt to make the 2/3 result more counterintuitive and thus more interesting. He didn’t, it seems, intend for that determiner to be considered as part of the problem, and should not be taken into account when we calculate the probability.

I suppose when we interpret something, we should attempt to discover the author’s “original intent” and use that to approximate our working interpretation of the text.

However, I still think it would have been better for Paulos to have improved his phrasing to avoid any doubt about what he meant.

EllisDee · October 24, 2006, 6:12am

Malacandra:

I thought this over far too much on the way home last night. It is misleadingly close to the famous Monty Haul problem, but not quite the same after all. If the condition is “I will select a two-child family at random and then I will tell you either that it contains a boy or a girl, and you must guess the sex of the other child”, then you’re back to a 50-50 guess. Reason: half the time the questioner will hit a single-sex family and has only one sex to tell you about. The other half, he will hit a mixed-sex family and (presumably) flip a coin as to whether to say “boy” or “girl”. That leaves you a 25% chance he hit a BB family and says “boy”; 25% he hit a mixed-sex family and says “boy”; 25% he hits a mixed-sex family and says “girl”; and 25% he hits a GG family and says “girl”. IOW, the likelihood of the other sibling being the opposite gender is exactly 50%.

How is that different from the Monty Hall problem? If Monty randomly selects a door to open, the times he reveals a goat you are 50-50 to switch, not 2-1. It’s only 2-1 if he is forced to reveal a goat. Same with your example here.

Malacandra · October 26, 2006, 12:43pm

Sorry, sloppy on my part. The way I first heard the Monty Hall problem, Monty was obliged to reveal a goat, not the car; which is the same as the Professor being obliged to pick a family that’s not boy-boy. Otherwise you’re right; Monty having a free choice of doors is the same as the Professor having a free choice of families, and only afterwards deciding whether to tell you it contains a girl or a boy.

And that (and nothing to do with the name Myrtle) is where Paulos went wrong.

ultrafilter · October 26, 2006, 1:13pm

This isn’t true. If Monty shows a goat when you pick the correct door with probability [symbol]a[/symbol] and when you pick the wrong door with probability [symbol]b[/symbol], the probability that you’ve picked the correct door given that Monty shows a goat is [symbol]a[/symbol]/([symbol]a[/symbol] + 2[symbol]b[/symbol]). This is only equal to 1/2 when [symbol]a[/symbol] = 2[symbol]b[/symbol], and equal to 1/3 whenever [symbol]a[/symbol] = [symbol]b[/symbol].

EllisDee · October 27, 2006, 8:18am

I fought the good fight on this to the best of my abilities, but that does not appear to be the case. The discussion was in the thread Explain the Monty Hall problem to me? It was made quite clear that if Monty is randomly picking a door to reveal, (which means he could reveal the prize,) then on the times he happens to reveal a zonk your odds for switching are 50-50, not 2:1. Not surprisingly, you participated in that thread. Since most of that thread deals with the standard problem, I’ll link to the posts that directly address the “random Monty” variation:

In [post=6465715]post 26, CurtC[/post] explains that the problem is different if Monty randomly reveals a door.
In [post=6465983]post 32, I[/post] voiced dissent with this idea.
In [post=6466498]post 48, Xema[/post] formally defines and analyzes the problem.
In [post=6467177]post 58, mazinger_z[/post] reiterates that the odds change if Monty reveals a random door.

Using that last link, you should be on page 2 of the thread. Most of page 2 is devoted to the “random Monty” variant, so I’ll leave the rest as an exercise for the reader. I would point out that in post 67 you invited CurtC to run a simulation to back up his argument. He did, and posted the results (which agreed with his 50-50 prediction) in post 79. pmwgreen ran an independent simulation (that also agreed with the 50-50 prediction) and posted the results in post 90.

You yourself agreed that the conditional probability was 1/2 in post 99.

I revved up for my final “it’s still 2:1 to switch” stand in [post=6507065]post 136[/post]. After the thread had been dead a while, I actually did run a simulation of my own, and was disheartened to find that CurtC and Xema were correct all along.

ultrafilter · October 27, 2006, 1:17pm

I’ve since given it more thought. Here’s a proof of my assertion above.

First, let [symbol]p[/symbol][sub]i[/sub] denote the probability that the prize is behind door i, and p[sub]i[/sub] denote the probability that you pick door i. The probability that you pick the correct door is p[sub]1[/sub][symbol]p[/symbol][sub]1[/sub] + p[sub]2[/sub][symbol]p[/symbol][sub]2[/sub] + p[sub]3[/sub][symbol]p[/symbol][sub]3[/sub]. If [symbol]p[/symbol][sub]i[/sub] = 1/3, that becomes 1/3(p[sub]1[/sub] + p[sub]2[/sub] + p[sub]3[/sub]). The quantity in parentheses is equal to 1, so the probabilty that you’ve picked the correct door is, in fact, 1/3.

Now, let C denote the event that you’ve picked the correct door, and O denote the event that Monty opens a door. From above, we have that P© = 1/3, so P(C[sup]c[/sup]) = 2/3. Let P(O|C) = [symbol]a[/symbol] and P(O|C[sup]c[/sup]) = [symbol]b[/symbol]. We want P(C|O). By Bayes’ Theorem, P(C|O) = P(O|C)P©/(P(O|C)P© + P(O|C[sup]c[/sup])P(C[sup]c[/sup])), which is equal to ([symbol]a[/symbol]/3)/([symbol]a[/symbol]/3 + 2[symbol]b[/symbol]/3). Remove the common factor of 1/3, and you have that P(C|O) = [symbol]a[/symbol]/([symbol]a[/symbol] + 2[symbol]b[/symbol]).

Topic		Replies	Views
On Cecil's Columns/Staff Reports	4	760	September 23, 2000
Probability question re: boy/girl riddle Factual Questions	37	4169	July 25, 2008
Boy/Girl probability in Monty Hall Cecil's Columns/Staff Reports	144	7621	August 24, 2006
Are there any statisticians in the house? Factual Questions	6	1096	August 20, 2006
Wrong reasoning and/or conclusion in a column Cecil's Columns/Staff Reports	19	2118	December 6, 2006

The boy/girl probability question (for the 50th time. I know. I'm sorry.)

Related topics