Basic Probability

"*What would result if one took a deck of playing cards, and tried to guess whether each card were black or red? This would involve placing the deck in a stack, face down, and making a guess on each face-down card prior to turning each card face up, one at a time.

What would the average positive results be? Would one be right about 50% of the time? or, 33%? maybe 25%? I was especially wondering if there is an accepted, statistical average percentage that can answer this question?* "

Returning to the original problem … Of course, if you just guess red or black at random, your expected number of correct guesses would be 26 or a 50% success rate. Let’s assume you’re smarter than this – that you will exploit complete knowledge of all prior turned-over cards in making your next guess. That means you will always guess the color that makes up most of the remaining deck. If there is an even number of red and black cards left in the deck, then you flip a coin and guess red or black with equal probability. For example, if the first card turned over is red, you will guess black for the next card. If the first two cards are red, guess black. If the first two cards are red and black (or vice versa), guess red or black. Etc. Of course, only the last card can be guessed with certainty under all scenarios. Working this out analytically, the exact answer for the expected number of correct guesses is 31.9979620, or a 61.5% success rate. Had to write a small program to do the tedious calculations that required the binomial coefficients and probabilities based on the above assumptions.

:smack:

I repeat:
:smack:

I had a bug in my shuffling. Suffice it to say that I was trying to be clever in one spot, and I shot myself in the foot. In any case, I’ve fixed the problem.

Comparing my (fixed) code against wolfman’s calculations:



2 card deck: 0.7500 +/- 0.0002 (in agreement with his analytic 0.7500)
4 card deck: 0.7085 +/- 0.0001 (in agreement with his analytic 0.7083)
6 card deck: 0.6835 +/- 0.0001 (in agreement with his analytic 0.6833)


Since these agree nicely, I now trust my code. :smack:

Updating my previous tables:

  • Expected number correct: 30.037 +/- 0.002 (in disagreement with nivlac’s calculation)

  • The distribution:



  num
correct     prob.      std.dev.
--------------------------------
  <26    0.0           (exact)
   26    0.0370520    0.0001572
   27    0.1032814    0.0002624
   28    0.1483201    0.0003145
   29    0.1659014    0.0003326
   30    0.1582914    0.0003249
   31    0.1326954    0.0002974
   32    0.1003934    0.0002587
   33    0.0682087    0.0002132
   34    0.0417547    0.0001668
   35    0.0231900    0.0001243
   36    0.0119560    0.0000893
   37    0.0054333    0.0000602
   38    0.0022287    0.0000385
   39    0.0008673    0.0000240
   40    0.0003187    0.0000146
   41    0.0000813    0.0000074
   42    0.0000213    0.0000038
   43    0.0000033    0.0000015
   44    0.0000013    0.0000009
   45   <0.000002      (95% CL)


  • exactly 40/52 in one shot: 0.000021 +/- 0.000004
  • exactly 33/52 in one shot: 0.0682 +/- 0.0002
  • at least 40/52 in one shot: 0.00043 +/- 0.00002
  • at least 33/52 in one shot: 0.1541 +/- 0.0003

My numbers are with card counting.

I wrote a small perl script to calculate the value exactly. I think. I’m sure people will point out any errors I make:



#!/usr/bin/perl

use strict;
my @cards;

sub GetProb($$) {
  my ($cardsLeft, $count) = @_;

  if ($count >= $cardsLeft || $count <= 0) {
    return $cardsLeft;
  }

  if (!defined($cards[$cardsLeft][$count])) {
    my $ratio = $count/$cardsLeft;
    my $guess = $ratio > .5 ? $ratio : 1 - $ratio;

    $cards[$cardsLeft][$count] = GetProb($cardsLeft - 1, $count) * (1 - $ratio)
      + GetProb($cardsLeft - 1, $count - 1) * $ratio + $guess;
  }

  return $cards[$cardsLeft][$count];

}

die "Usage:  $0 cards count
" unless $ARGV == 1;
print GetProb($ARGV[0], $ARGV[1]), "cards guessed correctly
";


It’s result jibes with Pasta’s latest result, printing 30.0407 cards.

I said:

aahala said:

So, was I correct, and off a bit because of rounding errors in my calculator, or was I just plain wrong, but happened to come close? I’m confused. :frowning:

Cool :cool:

I keep trying to figure out the exact function curve, but can’t quite get it.

I’m pretty sure the demoninator is (((N) choose (Half of N)) times N)
(I can’t get the coding to look right so I hope that makes sence in verbal form)

But I can’t get my brain around what the numerator is so I’m trying to do it backward, ie figure out the number, then create the function to make it work,

If possible could you tell me what you get for 8, 10, and 12 card decks with your program? With the denominator I can plug in the appoximate numerator to try to decipher the numerator of the function. It has got to be at least exponential complexity, or the function would head toward zero real quick.

aahala’s answer is an approximation, but good enough for any practical purposes.

I’ll do you one better. I’ve set the program running, doing 5 million trials at each deck size. The output is being directed here:

http://home.fnal.gov/~rbpatter/cardcounting/expectation.txt

New numbers will show up at the bottom automatically. It’s at a deck size of 20 right now…

Yes, you know this. I ran up this recursive definition of F(n,0) (which is D[sub]n[/sub] by definition) so that those with no previous experience with derangements (in the mathematical sense) could follow it from step one with no reference to authority like “D[sub]n[/sub] is this formula. Trust me.”

I modified my code to run through many different deck sizes, and to generate the standard deviation:



#!/usr/bin/perl

use strict;

my @expected;

sub GetProb($$)
{
  my ($redCards, $blackCards) = @_;

  if ($redCards == 0) {
    return ($blackCards, $blackCards);
  } elsif ($blackCards == 0) {
    return ($redCards, $redCards);
  }

  if (!defined($expected[$redCards][$blackCards])) {
    my $pickRed = $redCards / ($redCards + $blackCards);
    my $pickBlack = 1 - $pickRed;
    my $guess = $pickRed > $pickBlack ? $pickRed : $pickBlack;
    my $guessSquared = $guess ** 2;
    my @lessRed = GetProb($redCards - 1, $blackCards);
    my @lessBlack = GetProb($redCards, $blackCards - 1);

    $expected[$redCards][$blackCards][0] = $lessRed[0] * $pickRed +
     $lessBlack[0] * $pickBlack + $guess;
    $expected[$redCards][$blackCards][1] = $lessRed[1] * $pickRed +
     $lessBlack[1] * $pickBlack + $guessSquared;
  }

  return @{$expected[$redCards][$blackCards]};
}

die "usage: $0 max
" unless $#ARGV == 0;

for(my $i = 1; $i <= $ARGV[0]; $i++)
{
  my @sum = GetProb($i, $i);
  my $n = $i + $i;
  my $stdDev = sqrt($n * $sum[1] - $sum[0] ** 2);
  my $stdDev3 = 3 * $stdDev;

  printf "Cards: $n expected correct: %.4f +/- %.4f (68%) %.4f (99%)
",
   $sum[0], $stdDev, $stdDev3;
}


It generates expected values similar to Pasta. It gives different standard deviations, though. I believe that Pasta’s are incorrect. For example, for the simple case of a two-card deck, one black card and one red card:

By definition, std. dev. = sqrt[E([x - E(x)][sup]2[/sup])]. There are four cases:
Cards picked: RB. Guesses: BB, RB. Correct guesses: 1, 2
Cards picked: BR. Guesses: BR, RR. Correct guesses: 2, 1
E(x) = 1.5
E([x - E(x)][sup]2[/sup]) = 0.25
Std. dev = sqrt(0.25) = 0.5

My program says that the standard deviation is 0.5, but Pasta’s say it’s 0.000274.

Again, by the definition of the standard deviation, 68% of all cases are expected to fall within 1 standard deviation from the mean, and 99% of all cases are expected to fall within 3 standard deviations from the mean.

For a couple of lines of its output:
Cards: 2 expected correct: 1.5000 +/- 0.5000 (68%) 1.5000 (99%)
Cards: 4 expected correct: 2.8333 +/- 0.8660 (68%) 2.5981 (99%)
Cards: 6 expected correct: 4.1000 +/- 1.1832 (68%) 3.5496 (99%)
Cards: 8 expected correct: 5.3286 +/- 1.4702 (68%) 4.4105 (99%)
Cards: 10 expected correct: 6.5317 +/- 1.7355 (68%) 5.2065 (99%)

Cards: 50 expected correct: 28.9533 +/- 5.3855 (68%) 16.1566 (99%)
Cards: 52 expected correct: 30.0407 +/- 5.5290 (68%) 16.5869 (99%)
Cards: 54 expected correct: 31.1263 +/- 5.6703 (68%) 17.0109 (99%)
Cards: 56 expected correct: 32.2105 +/- 5.8096 (68%) 17.4289 (99%)
Cards: 58 expected correct: 33.2931 +/- 5.9471 (68%) 17.8412 (99%)
Cards: 60 expected correct: 34.3743 +/- 6.0827 (68%) 18.2480 (99%)

So, for 52 cards, you would expect to guess right on 30 cards, and 99% of the time you’ll guess from 14 to 46 cards correctly.

I have not quoted any numbers related to the width of the distribution. I have given the standard deviation of my estimate of the mean. If I didn’t do this, one could not meaningfully compare my estimates to analytic calculations (such as wolfman’s.) Yes, an RMS of 10[sup]-4[/sup] would be absurd.

Apologies if this wasn’t clear before.

You need to be careful with asymmetric distributions such as this one. You will never get only 14 cards correct. In fact, you’ll never get fewer than 26 correct.

For 52 cards, I find an RMS of 2.373 +/- 0.003. This comes from the distribution I posted above. This is inconsistent with your 5.53. I’ll have a look at your code in a little bit.

The 5.53 is almost definately wrong. Working from hand, the standard deviation for a 4 card deck should be about 0.66, not 0.87 like my program said.

Minor bump: over the weekend, I’ve had time to go over my code. The trick I used to roll up the information for the expected mean into a single number won’t work for the variance, as it’s a non-linear function. I altered my code to sum up the probabilites for each correctly-guessed number of cards, and to calculate the average value and standard deviation only at the end.


#!/usr/bin/perl

use strict;

my @correctGuess;

sub CalcExpected($$$)
{
  my ($prob, $correct, $incorrect) = @_;
  my $negation = 1 - $prob;
  my $limit = scalar(@$incorrect) > scalar(@$correct) ?
   scalar(@$incorrect) : scalar(@$correct) + 1;
  my @result;

  for(my $i = 1; $i < $limit; $i++)
  {
    $result[$i] = $incorrect->[$i] * $negation + $correct->[$i-1] * $prob;
  }

  return \@result;
}

sub SetProb($$)
{
  my ($redCards, $blackCards) = @_;

  if (!defined($correctGuess[$redCards][$blackCards])) {
    if ($redCards == 0) {
      $correctGuess[$redCards][$blackCards][$blackCards] = 1;
      return;
    } elsif ($blackCards == 0) {
      $correctGuess[$redCards][$blackCards][$redCards] = 1;
      return;
    }
    SetProb($redCards - 1, $blackCards);
    SetProb($redCards, $blackCards - 1);

    my $pickRed = $redCards / ($redCards + $blackCards);
    my $pickBlack = 1 - $pickRed;

    if ($pickRed > 0.5) {
      $correctGuess[$redCards][$blackCards] =
       CalcExpected($pickRed, $correctGuess[$redCards - 1][$blackCards],
        $correctGuess[$redCards][$blackCards - 1]);
    } else {
      $correctGuess[$redCards][$blackCards] =
       CalcExpected(1 - $pickRed, $correctGuess[$redCards][$blackCards - 1],
        $correctGuess[$redCards - 1][$blackCards]);
    }
  }
}

die "usage: $0 max
" unless $#ARGV == 0;

for(my $i = 1; $i <= $ARGV[0]; $i++)
{
  my $n = $i << 1;
  my ($sumX, $sumXSquared, $stdDev) = (0) x 3;

  SetProb($i, $i);

  for (my $j = 1; $j <= $n; $j++) {
   $sumX += $correctGuess[$i][$i][$j] * $j;
   $sumXSquared += $correctGuess[$i][$i][$j] * $j * $j;
  }

  $stdDev = sqrt($sumXSquared - $sumX * $sumX);
  
  printf "Cards: $n correct: %.4f +/- %.4f (68%%) %.4f (99%%)
",
   $sumX, $stdDev, $stdDev * 3;
}

It gives the results:

Cards: 2 correct: 1.5000 +/- 0.5000 (68%) 1.5000 (99%)
Cards: 4 correct: 2.8333 +/- 0.6872 (68%) 2.0616 (99%)
Cards: 6 correct: 4.1000 +/- 0.8307 (68%) 2.4920 (99%)

Cards: 50 correct: 28.9533 +/- 2.3276 (68%) 6.9828 (99%)
Cards: 52 correct: 30.0407 +/- 2.3733 (68%) 7.1198 (99%)
Cards: 54 correct: 31.1263 +/- 2.4181 (68%) 7.2542 (99%)

which prefectly matches Pasta’s calculation for the standard deviation for a 52 card deck. So, the answer to the OP is the same, about 30 cards will be correctly guessed, but you can expect to ususally guess between 28 and 32 cards correctly, and will almost always guess at least 15 cards incorrectly.

Interestingly enough, though, looking at the internal data generated in the program for a 52-card deck, while the average number of correct guesses will be 30, you will guess exactly 29 cards correctly more often than you guess exactly 30:

1-25: 0%
26: 3.70%
27: 10.32%
28: 14.82%
29: 16.60%
30: 15.84%
31: 13.31%
32: 10.01%
33: 6.79%
34: 4.18%
35: 2.34%
36: 1.19%
37-52: < 1%