In column http://www.straightdope.com/columns/031024.html The Illustrious Master finishes off with the comment

If you have a look at the US census on boy’s names http://www.census.gov/genealogy/names/dist.male.first you will see that Mary is a recognised first name for boys. It is distinctly probable that there are or have been male individuals named Mary Smith.

Even worse. The rank of Mary is 699. That means it s the 699th most common firstname in the US census. It beats names such as Truman (716) Joan (759) Jaques (763) Shirley (936) Richie (970) Hank (983) Geraldo (1024) Ambrose (1096) Broderick (1218)

According to the census bureau, 0.009% of males have the first name Mary and 1.006% of people have the last name Smith. Assuming the two are independent, males named Mary Smith should constiute (.00009)(.01006)=.904 ppm (persons per million). So if you guess that any given person named Mary Smith is female, you will be correct, on average, 99.99990946 percent of the time. I daresay that is slightly better odds than graphology would provide.

bibliophage, I agree with your conclusion that the statistics would give much better odds than graphology, but have to question the details.

The 99.99990946-percent figure would be the number of people who are not in the category “males named Mary Smith.” In other words, people who don’t meet both the sex and name criteria. They are either not male, or not named Mary Smith…although they could be one or the other. That’s because you found 1 minus the probablility of being a (male named Mary) with (last name Smith). Parentheses for logical clarity

This does not, however, indicate that Mary is an overwhelmingly female name. The only conclusion you can draw is that just onder one in a million people are named Mary Smith and male. Is the number of females more? We need more information:

To find the number odds of a Mary Smith being male, you have to find the percentage of Marys that are male (rather than the percentage of males that are Marys, as above). The percentage of female Marys follows as (1-P).

The last name Smith, being independent, is really irreleveant. The percentage of Mary Smiths that are male ought to be the same as the percentage of Mary Andersons that are male. Assuming, once again, that first and last names are assigned independently…which is probably true for the most part.

According to the census, 2.6% of the female population has a surname Mary. Just on 1% of the entire population has a surname Smith.

Assuming independence of the variables, 0.026 * 0.01 = 0.0026 probability of a Female being named Mary Smith. This equates to 260 females per million named Mary Smith.

In contrast, one male per million is named Mary Smith.

Assuming equal numbers of males and females, any random Mary Smith will be male once in 261 times, so around 4 per 1000 Mary Smiths are male.

Consider also that some (most?) of the males listed as Mary from the census data had their gender incorrectly registered. With Mary being such a popular name, it would only take a very small percentage of gender errors to produce so many male Marys.

Dang. That’s what I get for trying to think without my recommended daily allowance of caffeine.

You are making at least two erronious assumptions here.

Firstly you assume that people don’t often name their male children Mary. This ignores the relatively common practice of naming male children after their maternal grandmother. The usual practice in these cases is to let the male child use a ‘male’ middle name, e.g. M John McGillicuddy. There are quite a few individuals who do this. How many because of a female first name?

Second, you assume the census department makes mistakes. This is quite possible, but the census department also recognises this and adjusts data results to compensate. They are after all, one of the foremost practitioners of statistical analysys.

I wasn’t making assumptions, I was actually asking if this was possible. It seems to me that in any sampling there is the chance for errors to creep into the data.

How would the census correct if someone checked the wrong gender box on a census for? It’s a serious question, I don’t know they would correct for this and if they would expend much effort on something like naming.

A check would be to see if there are any other traditional women’s names that show up on the male names list, and vice versa. Some would likely be actual names given to children, but how would we verify this? Except for names that are used for both genders (Leslie, Chris) I don’t personally know anyone with a name traditionally used for the opposite gender, definately no males named after their maternal grandmother.

I’m not contesting what you say, just saying that it doesn’t match my personal experiences.

Only to throw a spanner in the works.

In Spanish, the name “María” is used for boys only in compounds like “José María” or “Juan María”. No boy is called “María” alone (of course there might be an example, although with that name you’re not likely to survive school). Similarly “José” may be used by girls in the compund “María José”.

Next you’ll be telling me there are boys named Sue.

No, but I did find Joan, Rosario, Patricia, and Shirley. (As well as Son and Man.) Since there are only a few of what I would consider traditional girl-only names I guess that would suggest the false data entry theory isn’t likely.

So who is naming their son Joan?

“Shirley” is not necessarily a woman’s name, even though it has become so in recent US practice, so there’s nothing all that surprising about that. And “Rosario” is grammatically masculine to begin with. But “Joan” and “Patricia” are femininizations of male names, so they seem distinctly odd.

Although Rosario is a masculine and therefore a possible boy name, in Spanish it is alomst unheard-of as such, especially in modern times; (almost) all Rosarios are girls. Interestingly enough, Jesús can be used for both men and women

What relatively common practice of naming a male child after the maternal grandmother?

http://www.tcarden.com/tree/ensor/Name.html shows different naming patterns.
Of interest is the 19th century English & Welsh pattern of second son named after mother’s mother. This still occurs but much less frequently than then.

Or, how about a man named Evelyn, i.e, Evelyn Waugh?

Interestingly enough the “Evelyn Waugh was a man” thing comes up in the movie Lost in Translation. I laughed. Me and maybe only me.

I believe that “Joan (which I think is pronounced ‘zho-AHN’)” is “John” in Catalan (e.g. Joan Miro). Could some of the males with “feminine” names be the children of immigrants?

