I was about to run my mouth and contradict you, but I think that you’re right.

Point 3 in **Blaster Master**’s post shows where abstract mathematics and real life take different forks in the road.

In real life, there will be couples that never have a boy, and that will skew the numbers to girls. The actual possibilities are:

B

G (and then die, go infertile, or whatever)

GB

GG

GGB

GGG

GGGB

and so on.

It would take a lot of assumptions to be able to actually run the math on it.

Yeah, all the talk about complete strings or incomplete strings is just confusciation and doesn’t affect the problem. As long as, for any particular child born, the probabilities are 50-50, then there will be no predictable long-term swing either way no matter what family planning scheme you adopt.

Of course, with real people, the actual ratio varies slightly depending on the level of health care - boys are slightly more likely to be conceived but are more likely to miscarry without good OB-GYNs, and so on. But such real world considerations are also not part of the problem.

Points 2 and 3 don’t affect the outcome. If the odds are 50/50 (as you say in point 1) then you get half boys and half girls.

I don’t believe that’s correct. Consider a GGG family. By the original rules, the family would continue to have children. What’s the expected number of boys that they’d have? 1. What’s the expected number of girls that they’d have? 1.

Therefore, if they stop having children, they remove 1 expected boy from the population and 1 expected girl, so the ratio of girls to boys doesn’t change. So I don’t think that #3 is actually a necessary assumption. I’m going to run the probabilities again and see if I can confirm this.

This is actually a really nice, beautiful problem, because the correct answer of “the expected value of the difference is 0” is so simple and easy to prove (on every “day”, each family has the same probability of adding one boy as of adding one girl (either 50-50, if they haven’t yet had a boy, or 0-0, if they have), and thus the *expected* difference remains 0), and yet there’s two different modes of reasoning that lead to strong intuitions for differing wrong answers: that there should be more boys because they’re what people are “trying for”, and that there should be more girls, because a family can have multiple girls but can only have one boy.

For the record, if you’re counting the number of trials until some outcome occurs, and that outcome occurs with probability p, the number of trials is distributed geometrically with parameter p.

In this problem, p = 1/2. Going by the second distribution on the linked page, the average number of girls born to each family is 1, and the variance is 2. If you have n families, let S[sub]n[/sub] denote the total number of girls in the population. By the central limit theorem, the distribution of (S[sub]n[/sub] - n)/sqrt(2n) is approximately normal with mean 0 and variance 1. In other words, as n grows arbitrarily large, the probability that S[sub]n[/sub] = n becomes arbitrarily close to 1. Obviously the total number of boys born to n families is n, so the ratio of boys to girls converges to 1 as n increases without bound.

Well, here’s my thoughts, having two sequences of GGG and GGB versus a sequence of GGGGGB are fundamentally the same, except the probability of G5B has a 1.56% probability of occuring, but G2B has a 12.5% chance and G3 has 12.5% chance times whatever the probability is of having an early death. However, you’ll note that the early death will never result in sequences that have extra boys (as each sequence has at most one boy), but it WILL result in sequences that will favor girls.

IOW, it will likely depend on what assumptions you make about that probability, and I honestly didn’t feel like dealing with it, since it’s really an abstract mathematical problem, so I just through it out as an assumption so I could ignore the complications… but I’m willing to run a few simulations to play with it if we think it’s a relevant enough question.

Don’t discount the three girls they’ve already had. There’s no extra three boys to balance them out.

Or, in short, linearity of expectations is, as always, awesome. The expected value of a sum of variables, *even highly non-independent variables*, is the sum of their individual expected values. (More generally, this works for any linear combination). So, letting the variable V_{F, d} be +1 if Family F has a boy on day d, -1 if Family F has a girl on day d, and 0 otherwise, we see that the total difference between boys and girls is the sum of all these variables. Furthermore, each of these variables has expected value 0 (since the expected value of V_{F, d} is just the probability that F hasn’t had a boy by day d [and thus will try again today] times (+1 * 50% [for having a boy this time] + -1*50% [for having a girl this time]) = 0). Thus, the expected value of the sum being the sum of the expected values, we get the answer 0 for the expected total difference between boys and girls.

I’m surprised; are you saying #3 from **Blaster Master** is a necessary assumption? **Kendall Jackson** is correct; it’s not a necessary assumption. Suppose, for example, we tossed in that people some probability of dying each day. The expected value argument I gave in my last post would still go through: the expected value of V_{F, d} would be the probability that the family was alive on day d and hadn’t yet had a boy times the same (+1 * 50% + -1*50%) = 0 term.

I think **BM** #3 is needed. Here’s my reasoning.

There is a maximum number of children a couple can have before menopause. Let’s say it’s 8. So here are our probabilities:

```
B 50%
GB 25
GGB 12.5
GGGB 6.25
GGGGB 3.125
GGGGGB 1.5625
GGGGGGB .78125
GGGGGGGB .390625
GGGGGGGG .390625
```

Notice that the last entry is double its expected value? Because mom can’t have any more babies!

Can you explain to me (English major!) why I’m wrong?

No, I think that in your example you are right.

I believe that if #3 were replaced with “After having a girl, a couple decides with probability p whether to have another child”, then you’d still see a 1:1 ratio of girls to boys, but I haven’t proven that.

What do you mean when you say the last entry is “double its expected value”? Notice that in your example, the mean number of girls per family is 0.99609375, which is the same as the mean number of boys per family. This is not a counterexample at all. For simplicity, we could look at the analogue where people can only have at most two children. Then, it’s

B 50%

GB 25%

GG 25%

Average number of boys per family? 0.75. Average number of girls per family? 0.75. Average difference? 0.

I outlined a rigorous proof of essentially this specific replacement in my last post. But, you don’t actually need any replacement at all. **Randy Seltzer**’s post did not actually furnish a counterexample. As long as every birth has equal probability of furnishing a boy or a girl, the linearity of expectations argument goes through and we get an expected difference of 0.

Well, I suppose you meant “double the value we would expect (in the non-technical sense) if it weren’t for the menopause”. Sorry, I just got confused because I automatically took “expected value” in the technical sense (i.e., probabilistically weighted arithmetic mean).

Perhaps this would be a good way to illustrate the situation: let’s suppose that every time a baby is born, the world splits into parallel universes: one where it’s a boy and one where it’s a girl. We can view this as cloning everyone in the world, and then adding one boy and one girl. Every other probabilistic event splits the universe as well: for example, if someone has probability 1/3 of dying, then when the time comes, the universe splits into three, and two of his clones live while one dies.

Clearly, the difference between boys and girls over all parallel universes will remain 0 as long as births have 50/50 boy/girl probabilities and causes of death are also equally likely to strike boys as girls. *Nothing else matters*. And since the difference taken over all parallel universes is exactly the same thing, mathematically speaking, as the expected difference given by the probability distribution, this tells us that the expected difference is 0.

**Indistinguishable** is right, and though the math was staring me in the face, I had to look at the simulation to see what was happening. With the simple B, GB, GG case, B has twice the chance of being selected as either of the other two cases. So, it can easily be rewritten as B, B, GB, GG with each having equal probability so with simple counting, you’ll see there’s the same number of Bs and Gs there (3 of each).

Sorry, for the hijack.

Ah. Thanks **Indistinguishable**. I had to draw a picture.

This is why I quit after calculus.

(And yes, for me “expected value” means “value I would expect,” though I realize that’s a term of art with a different meaning.)

The sad thing about this made-up problem is that it has been tested already in the real world.

Normally there are about 105 boys born for 100 girls (more boy babies die after birth), so the rate in the early 80s wasn’t too far out of line.

The situation in China isn’t an accurate reflection of the problem stated in the OP. In China unborn girls are being aborted, or newborn baby girls are being killed. That means that the birthrate is no longer 50/50.