I remember being a class once that illustrated “the birthday paradox” - in a random selection of only 30 people or so, there’s actually a very high statistical chance that two of them will have the same birthday. It’s one of those things that seems like it would be unlikely on a practical level, yet sure enough, two people in our class had the same birthday.
What is this an example of? What are some other things that seem like they’d be unlikely, but are actually stastically very likely to happen?
I’m not sure how to answer this. It’s an example of a pigeon-holing problem. Every birthday that’s “taken” leaves one less “safe birthday” for the next person to land on. Is that the answer you were looking for?
The odds at 23 people are right about 50/50 that at least two of them will have the same birthday. By 40 people it’s 90%. By about 100 people the odds are so infintesimal that 2 people won’t have the same birthday, my PC rounds it to zero.
Pigeon-holing problems can often have higher than ‘gut-feel’ probabilities.
Eventually, of course, you run out of numbers and the probability is 1.
Another example is something I was dealing with at work - a pseudo-random number between 1 and a million - after generating a hundred of them, what are the odds that some two of them will be the same number? About 1.5%, if I recall correctly. It sounds pretty high, but after you’ve generated a hundred, one hundred numbers are no longer ‘safe’ - that’s one out of each ten thousand.
I’ve never viewed this as a real paradox. It’s more like a probability that you wouldn’t expect based on common sense or intuition. If you study probability, you will encounter many such examples. Another one is the well-known toothpick problem. If you draw a bunch of parallel lines, say one inch apart, and drop a one-inch toothpick on your picture, what’s the probability that the toothpick will cross a line? It’s almost 64%. Intuitively, you would expect a smaller probability. Along the same vein, you might enjoy the Monty Hall Problem. Just google it.
(Just so you don’t need to click the link, a (cryptographic) hash function reduces an arbitrary message to a number of a fixed length. This passage explains why the resulting number has to be so long. (In short, Bad Things happen if it’s easy to generate two messages that hash to the same value. Ideally, the best method to do that should be to create every possible message and hash it, which is flatly impossible, but no hash function comes close to that ideal.))
The one I use to demonstrate the effect is using the phone book, it’s easier. Tell someone that you will bet them that 2 phone numbers have the same last 2 digits from the first lot of entries on a page you pick at random. Explain that there are 100 possible last digit combinations and ask how many numbers they will let you include for an even bet. Plenty of people will say 50. Cut it in half and take 25. You are, in fact, 25 to 1 on to find 2 matches. The first 12 numbers would give you a roughly even chance.
I know so little about statistics, but I remember reading one of many debate posts about evolution here on SD, and someone pointed out a stat. example the opposite of what you’re looking for- this is an example of when statistics says something is highly unlikely when we know in a practical sense it’s 100% likely!
He pointed out that a Christian arguing that complex life cannot form from random chance, it’s too statistically unlikely- starting from B and looking backwards to A, I guess is how the skeptic sees it- is like looking at a classroom full of 60 people; all these people have birthdays, but the odds of each one having his/her particular b-day is 1/360, so the odds of the whole class having the birthdays they have is 1/360 times 60- astronomically unlikely!- yet there they sit!
My math skills are rude and untutored, but if “probability is 1” means certainty, then I think you must be mistaken. No matter how many people are in the room, it’s theoretically possible that they all have the same birthday, so there’s no number of people that makes it certain that all 365 days are covered.
Similarly, in your random number example, it may be true that “after you’ve generated a hundred, one hundred numbers are no longer ‘safe’”, but then again, it may not.
The probability we’re looking at is the probability that at least two people share a birthday, or two randomly generated numbers are equal. The probability you are thinking of is the probability that the next person has the same birthday as someone in the group, but in bup’s examples that is just a step in calculating the over-all probability. If you have 365 people sharing the same birthday you’ve added 363 people too many.
The “paradox” seems less unusual when you consider just how many comparisons are being made.
For 30 people, there’s not just 30 comparisons being made. Think of it like this.
Alice is in the room.
Bob comes in. Does he have the same B-day as Alice? No? Bring in Charlie.
Charlie comes in. Same B-day as Alice? Same B-day as Bob? No? Bring in Dale.
Dale comes in. Same B-day as Alice? As Bob? As Charlie?
By the time the Zelda comes in, you’re comparing her to 25 others. And, you just compared Yancey to 24 before that.
Sure, it’s still not likely for one person to match 25 others, but really, you’re looking at how likely that one person matches 25 others PLUS the chance of one person matching 24 others PLUS the chance of one person matching 23 others, etc. etc. etc.
?? If they all have the same birthday, then the condition is already satisfied.
If you had a room with 367 people in it, there is no way that all of them could have different birthdays. In other words, the probability would be 1 that some pair of them had the same birthday.
If n is the sample size, and m is the number of outcomes, then the probability that all outcomes are distinct is
n!/((n - m)! n[sup]m[/sup])
and the probability that at least two outcomes are identical is one minus this number.
“You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!”
The problem deals with two people having the same birthday, not everyone having a different birthday.
It’s true that as the number of people in a group increases, the probability that you’ve covered every day of the year approaches 1, but for any finite number, it’s less than 1.
Yeah, I think I mislabelled my numbers. In my original formula, n should be the size of the pool that you’re selecting the objects from, and m should be the number of objects you select. So I probably should have switched both m and n.