# Birthday statistics question

I’ll admit I’ve always been weak with statistics, so I’ll throw this one out for the more mathematically inclined.

How large a group of people would you have to have before you had a 50% or better chance of at least one person in the group being born on each day of the year? Assume that birthdates are randomly distributed and ignore February 29.

With a couple of thousand in my family tree, I still had about 2 dates unused a couple of years ago when I checked it out.

I can’t give you an exact figure, because there were people listed without birthdates. A couple of thousand is conservative considering there are presently 8,000 in my family tree.

OK, it’s late, so I may have made a mistake, but here it is:

Given n people, the probability that all possible birthdays are represented is
SUM [(-1)[sup]i[/sup] * C(365,i) * (365 - i)[sup]n[/sup]] / 365[sup]n[/sup]

where the sum is from i = 0 to 365.

Playing around with this, I get that the probability for 2,286 people is about 49.941%, and the probability for 2,287 people is about 50.037%.

Cabbage Rules!

(And another Great Band Name candidate is born as well.)

Cabbage does indeed rule, but one should not forget the brassicous assumptions made when coming up with that formula:
[ul]
[li]365 birthdays per year (ignoring 29 February)[/li][li]all birthdays have an equal probability[/li][/ul]

Some birthdays are more likely than others, therefore the probability of two people in a group of (e.g.) 50 having the same birthday is higher than would be calculated by the formula given by Cabbage.

Yeah, but who wants to do the problem without those assumptions?

I think Cabbage did hit the nail right on the head.

Actually in a group of 23 people, the odds of two people having the same birthday is just over .5. Here is a cite,http://www.mste.uiuc.edu/reese/birthday/explanation.html

Different problem, dude.

Crimeny, just ignore my last post. I’ll crawl back to my cave now.

Only 2287? A suprisingly small number. I’d guesstimated it would be much larger.

Incidentally, this board was not only the source for the answer but the source of the question as well. I’ve noticed the number of birthday threads posted lately and thought “with the number of posters on this board (19175) there’s probably someone having a birthday everyday” followed by “I wonder of that’s true?” I considered starting a thread to find out if it was literally true but decided to figure out the odds first. A few minutes in this attempt convinced me I had no idea of how to do that. So I decided if the SDMB got me into this quandary, the SDMB should get me out of it.

Thank you Cabbage and all.

I’m not any kind of an authority, but it seems to me that the group matters - if it’s a bunch of people in a nightclub out to see the lounge act then the birthday odds might be skewed a bit by the “people who go out because it’s someone’s birthday” factor.

Plus, as mentioned by A.W. some birthdays are more likely than others.

My nitpick is, it’s not statistics unless these things are really taken into account scientifically. It’s just math. Maybe an insurance actuary will drop by this thread and give us a shiny opinion.

I checked, and for 10,000 people, the probability is 99.99999996%. For the record, I tried 19,175; the result was so close my computer rounded it off to 100%.

I really can’t see the “some birthdays are more likely than others” making much difference. I’d be willing to bet that any differences would not have much effect on the 2,287 figure, and that that figure can safely be rounded off to 2,300 and be considered accurate (I really don’t think the differences could possibly push the actual amount anywhere close to 2,400).

If we also include leap day, that could make a difference. I don’t really want to bother with including that extra calculation from the beginning, but if we assume that 1 in 4*365 = 1460 people are born on leap day (a safe assumption, IMO), we get that if we have 2300 people gathered together, there’s about a 79.3% chance somebody was born on leap day.

So, in an attempt to completely kill this problem (including leap day part), I’m gonna make the following assumptions:

1. Birthdays are equally distributed over all days of the year, with the exception of leap day, which is the birthday of exactly 1/1460th of the population.

2. The fact that birthdays aren’t uniformly distributed doesn’t matter much, it’s close enough.

So,

P(all 366 days are represented among n people) =

P(all 365 regular days are represented and leap day is represented) =

P(all 365 regular days are represented) * P(leap day is represented, given that all other 365 days are represented)=

f(n) * g(n),

where f(n) is the function I posted in my first post, and

g(n) = (1 - (1459/1460)^(n-365)).

This breaks 50% at 2473 people.

I’m confident the answer in the real world is in the ballpark of 2500.

I just noticed that for the part where we include leap day, the function f I used should have been changed somewhat. Making it right would be somewhat messy, so it’s something I really don’t want to bother with. I do know that changing it would put the answer somewhere between 2467 and 2473 people, so it’s not really that significant. (I made another mistake and should have had 2467 instead of 2473 in my previous post. I got the 2473 figure because I had replaced all of the "365"s in f with "366"s and forgot about it. I know that these are upper and lower bounds for the corrected answer, though, and that’s good enough for me).

Also, I thought that I’d mention that including leap day, the probability that all birthdays are represented among 10,000 people is still greater than 99%.

Oh, and AmbushBug, group definitely does matter. If we’re talking about a group meeting of the Born on Leap Day Club, obviously everything I’ve done is gonna be bullshit. I’m assuming that it’s just a random sample of the population (a safe assumption (with respect to birthdays, anyway) if ultimately we’re applying this to this message board).

I tried deriving a formula for this like Cabbage so brilliantly did, but got nowhere; my respect overfloweth. Instead, I wrote a program which computes one million sample populations of any given size, and determines how many fit the criterion that every birthday is covered. Using the initial assumptions, I got:

For 2286 people: 49.9%
For 2287 people: 50.1%

This agrees pretty comfortably with Cabbage’s results, so that was reassuring. Encouraged, I then redid the program to take leap days into account - I assumed that the probability of someone being born on 29 Feb was 1/1461, and for any other day, 4/1461. This gave me the following probabilities:

For 2286 people: 39.4%
For 2420 people: 49.9%
For 2421 people: 50.0%

This is below the lower bounds for Cabbage’s result, though, so I guess I probably made an error somewhere.

I tend to think that birthdays not being evenly distributed would make a big difference, but I don’t have the stats necessary to plug in probabilities and test this idea.

Actually, now that I look at it, I have a pretty good idea that my function g is wrong.

The idea I had was that 365 of the n people had the 365 regular birthdays, and I wanted to find the probability that someone in the group was born on leap day. So I threw the 365 people out (n-365) (I know none of them were born on leap day), found the probability that none of the remaining people were born on leap day, and subtracted that from one (so that at least one was born on leap day).

1 - (1459/1460)[sup]n-365[/sup]

By the way, that should have been 1460/1461 (as Achernar has) instead of 1459/1460. I forgot to add 1. :o

The flaw is the old Monty Hall cars and goats type fallacy. For example, if I have two children, and I know one is a boy, what’s the probability that the other is a girl? I can’t just throw one of the kids out, and find the probability the other kid is a girl (which would give me 1/2); I don’t know which child is a boy. The real probability is 2/3.

I don’t know how much trouble offhand that will be to fix, maybe I’d be better off starting from scratch in that case (it’s easy enough to fix when there only two people and you’re only considering boy/girl; but I imagine it may be messy when your talking about n people and 366 different possible birthdays). I can’t do it now, I’m tired and need some sleep; anyone else see a way to fix it?

Anyway, Achernar, because of this, I’m quite sure my lower bound is too high. I imagine your figure of 2421 is pretty close after all, if not spot on.