Straight Dope Message Board > Main Birthday statistics question
 Register FAQ Calendar Mark Forums Read

#1
11-17-2001, 03:40 AM
 Little Nemo Charter Member Join Date: Dec 1999 Location: Western New York Posts: 52,780
I'll admit I've always been weak with statistics, so I'll throw this one out for the more mathematically inclined.

How large a group of people would you have to have before you had a 50% or better chance of at least one person in the group being born on each day of the year? Assume that birthdates are randomly distributed and ignore February 29.
#2
11-17-2001, 03:56 AM
 Eliahna Charter Member Join Date: Dec 2000 Location: Victoria, Australia Posts: 6,210
With a couple of thousand in my family tree, I still had about 2 dates unused a couple of years ago when I checked it out.

I can't give you an exact figure, because there were people listed without birthdates. A couple of thousand is conservative considering there are presently 8,000 in my family tree.
#3
11-17-2001, 05:37 AM
 Cabbage Guest Join Date: Sep 1999
OK, it's late, so I may have made a mistake, but here it is:

Given n people, the probability that all possible birthdays are represented is

SUM [(-1)i * C(365,i) * (365 - i)n] / 365n

where the sum is from i = 0 to 365.

Playing around with this, I get that the probability for 2,286 people is about 49.941%, and the probability for 2,287 people is about 50.037%.
__________________
...ebius sig. This is a moebius sig. This is a mo...
(sig line courtesy of WallyM7)
#4
11-17-2001, 05:17 PM
 ftg Guest Join Date: Feb 2001
Cabbage Rules!

(And another Great Band Name candidate is born as well.)
#5
11-17-2001, 05:44 PM
 Cabbage Guest Join Date: Sep 1999
#6
11-17-2001, 06:27 PM
 Arnold Winkelried Charter Member Charter Member Join Date: Oct 1999 Location: Irvine, California, USA Posts: 14,822
Cabbage does indeed rule, but one should not forget the brassicous assumptions made when coming up with that formula:
• 365 birthdays per year (ignoring 29 February)
• all birthdays have an equal probability

Some birthdays are more likely than others, therefore the probability of two people in a group of (e.g.) 50 having the same birthday is higher than would be calculated by the formula given by Cabbage.
#7
11-17-2001, 06:34 PM
 ultrafilter Guest Join Date: May 2001
Quote:
 Originally posted by Arnold Winkelried Cabbage does indeed rule, but one should not forget the brassicous assumptions made when coming up with that formula:365 birthdays per year (ignoring 29 February) all birthdays have an equal probability Some birthdays are more likely than others, therefore the probability of two people in a group of (e.g.) 50 having the same birthday is higher than would be calculated by the formula given by Cabbage.
Yeah, but who wants to do the problem without those assumptions?

I think Cabbage did hit the nail right on the head.
#8
11-17-2001, 06:35 PM
 Lost In Reality Guest Join Date: May 2001
Actually in a group of 23 people, the odds of two people having the same birthday is just over .5. Here is a cite,http://www.mste.uiuc.edu/reese/birth...planation.html
#9
11-17-2001, 06:46 PM
 ultrafilter Guest Join Date: May 2001
Quote:
 Originally posted by Lost In Reality Actually in a group of 23 people, the odds of two people having the same birthday is just over .5. Here is a cite,http://www.mste.uiuc.edu/reese/birth...planation.html
Different problem, dude.
#10
11-17-2001, 06:54 PM
 Lost In Reality Guest Join Date: May 2001
Crimeny, just ignore my last post. I'll crawl back to my cave now.
#11
11-18-2001, 01:28 AM
 Little Nemo Charter Member Join Date: Dec 1999 Location: Western New York Posts: 52,780
Only 2287? A suprisingly small number. I'd guesstimated it would be much larger.

Incidentally, this board was not only the source for the answer but the source of the question as well. I've noticed the number of birthday threads posted lately and thought "with the number of posters on this board (19175) there's probably someone having a birthday everyday" followed by "I wonder of that's true?" I considered starting a thread to find out if it was literally true but decided to figure out the odds first. A few minutes in this attempt convinced me I had no idea of how to do that. So I decided if the SDMB got me into this quandary, the SDMB should get me out of it.

Thank you Cabbage and all.
#12
11-18-2001, 01:44 AM
 AmbushBug Charter Member Join Date: May 2000 Location: Neon Desert Posts: 691
nitpick

I'm not any kind of an authority, but it seems to me that the group matters - if it's a bunch of people in a nightclub out to see the lounge act then the birthday odds might be skewed a bit by the "people who go out because it's someone's birthday" factor.

Plus, as mentioned by A.W. some birthdays are more likely than others.

My nitpick is, it's not statistics unless these things are really taken into account scientifically. It's just math. Maybe an insurance actuary will drop by this thread and give us a shiny opinion.
#13
11-18-2001, 03:05 AM
 Cabbage Guest Join Date: Sep 1999
I checked, and for 10,000 people, the probability is 99.99999996%. For the record, I tried 19,175; the result was so close my computer rounded it off to 100%.

I really can't see the "some birthdays are more likely than others" making much difference. I'd be willing to bet that any differences would not have much effect on the 2,287 figure, and that that figure can safely be rounded off to 2,300 and be considered accurate (I really don't think the differences could possibly push the actual amount anywhere close to 2,400).

If we also include leap day, that could make a difference. I don't really want to bother with including that extra calculation from the beginning, but if we assume that 1 in 4*365 = 1460 people are born on leap day (a safe assumption, IMO), we get that if we have 2300 people gathered together, there's about a 79.3% chance somebody was born on leap day.

So, in an attempt to completely kill this problem (including leap day part), I'm gonna make the following assumptions:

1. Birthdays are equally distributed over all days of the year, with the exception of leap day, which is the birthday of exactly 1/1460th of the population.

2. The fact that birthdays aren't uniformly distributed doesn't matter much, it's close enough.

So,

P(all 366 days are represented among n people) =

P(all 365 regular days are represented and leap day is represented) =

P(all 365 regular days are represented) * P(leap day is represented, given that all other 365 days are represented)=

f(n) * g(n),

where f(n) is the function I posted in my first post, and

g(n) = (1 - (1459/1460)^(n-365)).

This breaks 50% at 2473 people.

I'm confident the answer in the real world is in the ballpark of 2500.
#14
11-18-2001, 04:57 AM
 Cabbage Guest Join Date: Sep 1999
I just noticed that for the part where we include leap day, the function f I used should have been changed somewhat. Making it right would be somewhat messy, so it's something I really don't want to bother with. I do know that changing it would put the answer somewhere between 2467 and 2473 people, so it's not really that significant. (I made another mistake and should have had 2467 instead of 2473 in my previous post. I got the 2473 figure because I had replaced all of the "365"s in f with "366"s and forgot about it. I know that these are upper and lower bounds for the corrected answer, though, and that's good enough for me).

Also, I thought that I'd mention that including leap day, the probability that all birthdays are represented among 10,000 people is still greater than 99%.

Oh, and AmbushBug, group definitely does matter. If we're talking about a group meeting of the Born on Leap Day Club, obviously everything I've done is gonna be bullshit. I'm assuming that it's just a random sample of the population (a safe assumption (with respect to birthdays, anyway) if ultimately we're applying this to this message board).
#15
11-18-2001, 05:48 AM
 Achernar Guest Join Date: Aug 1999
I tried deriving a formula for this like Cabbage so brilliantly did, but got nowhere; my respect overfloweth. Instead, I wrote a program which computes one million sample populations of any given size, and determines how many fit the criterion that every birthday is covered. Using the initial assumptions, I got:

For 2286 people: 49.9%
For 2287 people: 50.1%

This agrees pretty comfortably with Cabbage's results, so that was reassuring. Encouraged, I then redid the program to take leap days into account - I assumed that the probability of someone being born on 29 Feb was 1/1461, and for any other day, 4/1461. This gave me the following probabilities:

For 2286 people: 39.4%
For 2420 people: 49.9%
For 2421 people: 50.0%

This is below the lower bounds for Cabbage's result, though, so I guess I probably made an error somewhere.

I tend to think that birthdays not being evenly distributed would make a big difference, but I don't have the stats necessary to plug in probabilities and test this idea.
#16
11-18-2001, 08:10 AM
 Cabbage Guest Join Date: Sep 1999
Actually, now that I look at it, I have a pretty good idea that my function g is wrong.

The idea I had was that 365 of the n people had the 365 regular birthdays, and I wanted to find the probability that someone in the group was born on leap day. So I threw the 365 people out (n-365) (I know none of them were born on leap day), found the probability that none of the remaining people were born on leap day, and subtracted that from one (so that at least one was born on leap day).

1 - (1459/1460)n-365

By the way, that should have been 1460/1461 (as Achernar has) instead of 1459/1460. I forgot to add 1.

The flaw is the old Monty Hall cars and goats type fallacy. For example, if I have two children, and I know one is a boy, what's the probability that the other is a girl? I can't just throw one of the kids out, and find the probability the other kid is a girl (which would give me 1/2); I don't know which child is a boy. The real probability is 2/3.

I don't know how much trouble offhand that will be to fix, maybe I'd be better off starting from scratch in that case (it's easy enough to fix when there only two people and you're only considering boy/girl; but I imagine it may be messy when your talking about n people and 366 different possible birthdays). I can't do it now, I'm tired and need some sleep; anyone else see a way to fix it?

Anyway, Achernar, because of this, I'm quite sure my lower bound is too high. I imagine your figure of 2421 is pretty close after all, if not spot on.

 Bookmarks

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is Off HTML code is Off
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home Main     About This Message Board     Comments on Cecil's Columns/Staff Reports     Straight Dope Chicago     General Questions     Great Debates     Elections     Cafe Society     The Game Room     In My Humble Opinion (IMHO)     Mundane Pointless Stuff I Must Share (MPSIMS)     Marketplace     The BBQ Pit Side Conversations     The Barn House

All times are GMT -5. The time now is 03:42 PM.