Probabilities involving multiple groups

I really like the mathematics and probability threads on this board, so here’s one for you guys…

First the problem: Let’s say I have two decks of playing cards (52 unique cards each), each deck shuffled independently. I then proceed to draw pairs of cards off the top, one card from each at a time, and put the drawn cards face up on the table. How many pairs of cards do I need to draw before I have a greater than 50% chance of any two drawn cards being the same? How about for 75% or 95% chances?

The above assumes a 52 card deck, but what about arbitrarily sized decks? (The decks are always exactly the same in makeup, but not ordering.) Is there a (reasonably) simple formula for approximating where the 50% threshold is, if given the size of the deck?

This is actually a real-world problem, although it’s mostly just a discussion point now. There is a component of an application I work on that randomly pulls items from a population in the manner described above, and something I noticed was that it did not take very long (on average) for there to be overlaps in the randomly selected pools. This value is so far below what we expected that there was some suspicion about how random the selection really was! I figured this was a variation on the birthday paradox, but figuring out the actual probabilities involved is beyond my meager math skills.

Oof. It’s always easier for me to approach these problems backwardsly.

That is - look at it as a chance of pulling up two cards that aren’t identical, and see how long you can keep that string up.

Pull any card from Deck A, and you have a 51/52 chance of pulling out a different card from Deck B. It doesn’t matter how many cards you’ve drawn so far or not; the number of cards in the deck doesn’t affect the likelihood that two cards match.

Once you’ve dealt with the first card (51 out of 52 times), your chance of pulling a second card is ((51/52) * (51/52)) = 96.19%. Third card? ((51/52)(51/52)(51/52)) = 94.34%.

Again, this is the chance that the cards you’re drawing are different - to be the same, then, is 100% less whatever we’re up to (so, a 5.66% of having drawn the same card at least once in three draws).

So, by that - after 36 draws you have a 50.39% chance of having drawn identical cards. After going through the entire deck (52 draws), you have a 63.56% chance. Reshuffle and keep going, and at 72 draws you hit a 75.29% chance. To get all the way up to 95% you have to go through the deck over three times (158, but I may have miscounted).

A related problem, if you go through the deck once, the expected (average) number of matches will be exactly 1 (so you’ll sometimes have zero matches, and sometimes more than one, and they average out). Interestingly, this is true regardless of the number of cards in a deck.

It’s actually fairly easy to calculate the average number of times until the first match, as long as you assume that the shuffling process really is random (i.e., the next ordering of each deck is independent of the current ordering) and that all rearrangements are equally likely.

On each shuffling, there are two possibilities: that there is a match, or that there is no match. The probability that no object from a set of n objects matches its original position in a rearrangement is all but equal to 1/e for n > 4 (see here). Otherwise, the position of the match is uniformly distributed, and so the expected number of attempts is n/2.

Therefore, if N is the random variable denoting the number of attempts until a match, and E[N] denotes the expected value of N, then E[N] = n/2 * (1 - 1/e) + (E[n] + n)/e. I get that the solution to that equation is E[N] = n/2 * (e + 3)/(e - 1), which is roughly equal to 5n/3.

You can figure out the actual probabilities with a similar approach, but it’s a bit more involved.

Argh, wait - do you mean for any two of the total set of face cards - including ones already dealt with - to be identical?

Hoof. Okay, let’s see: a card drawn from a deck will never be the same as any other card drawn from that same deck (assuming you don’t play poker with cheaters), so we’re only checking between the decks.

So, in that case, the probability of the first two cards being different is still the same - 51/52.

But once the second set gets flipped over… we know that card A1 and card B1 don’t match, so it’s the chance that A1 is not B2, and A2 is neither A1 nor B1. (We don’t have to worry about any of the chances of cards from the B deck matching, as it’s a reflexive property (i.e., if B2 matches A1, we’ll have picked that up because A1 matches B2) and we’ve established that no two B cards will match (again, unless your deck contains two Kings of Spades or something).

So: A1 != B2 is a 51/52 shot.
A2 != B1 or B2 is a 50/52 shot.

So total probability so far is (51/52) * ((51/52)*(50/52)). [i.e. - chance of getting through flip #1 times chances of getting through flip #2] = 92.4%.

For the third flip… again, we know cards A1 and A2 don’t match B1 and B2, so it’s the chance that A1 is not B3, and A2 is not B3, and A3 is not B1, B2, or B3. So: (51/52)(51/52)(49/52).

Cumulative chances? (51/52) * ((51/52)(50/52)) * ((51/52)(51/52)*(49/52)) = 83.83%.

I don’t remember enough real math to truly denote this pattern - it’s [((53-n)/52)^n*((52-n)/52)] for the series of n = 1 to whatever. Or something - waaaaay too long since high school calculus.

But continuing the series:
4th flip is 73.00 %.
5th flip is 61.05 %
6th flip is 50.07 % (A 49% chance of a match occuring.)
7th flip is 39.42 % (A 60% chance of a match occuring.)
8th flip is 29.78 %
9th flip is 21.57 % (A 79% chance of a match occuring.)
10th flip is 14.98 %
11th flip is 10.20 %
12th flip is 6.65 %
13th flip is 4.05 % (a 95% chance of a match occuring.)

All number crunching done using MS calculator and my data entry skills, so someone with a real function calculator or programming tools will probably be by to show where I screwed up some math, but I think I have the basic tools down.

Yes, exactly. I should have been clearer: I’m looking for the chance that any cards that have been drawn have a match somewhere within. I’m not looking for a matching drawn pair, just a pair somewhere in the “face up and on the table” cards.

For what it’s worth, your calculated chances look similar to what I’ve seen in practice.

What I’m hoping is that there’s a formula I could plug into a scientific calculator (or excel spreadsheet) that could tell me what some % chance point is (in terms of draw count) based on the number of cards in the deck, but I’m guessing that may not be likely.

Wait, no, it’s easier than that. It’s the chance that Card A and Card B don’t match each other (51/52) and that Card A doesn’t match any previous Card Bs, and Card B doesn’t match any previous Card As. So [(51/52) * (52-(n-1)/52)^2] for the series of n.

Using Excel this time (my fingers are worn out, and this is an easier sequence for me to program in):

After the 7th flip, you have a 52.01% chance of any of the 7 cards from Deck A matching any one of the 7 cards from Deck B.

After the 10th flip, you have a 81.00% chance.

After the 13th flip, you have a 95.03% chance.

Well slick. Thanks!

Actually, not that hard in Excel - it’s what I did.

Set up Column A the numbers 1 through 20, one per row.

In B1, have the formula “=((n-1)/(n))*((n - (A1 - 1))/n)^2)” where “n” is the size of your pool (52 for cards, in my example). Copy that down the B column.

In C1, put in the formula “=B1”.

In C2, put in the formula “=B2*B1”.

In C3, put in the formula “=C2*B3” and copy that down the C column.

In D, put in the formula “=1 - C1” and copy that down the D column. Format D as percentages.

Voila! D1 is now the chance that you’ll get a match after 1 grab; D5 is the chance after 5 grabs, etc.