Math/Probability Question (Percentages Upon Percentages)

Out of a pool of n white balls, x are chosen at random and painted with a blue dot. Then, from that same pool (which includes the now-painted balls), y are chosen and painted with a red dot.

What is the formula to determine how many (percentage-wise) of the balls have both red and blue dots?

xy/n^2 is the probability that a particular ball has both red and blue dots. Or 100*xy/n^2 for the percentage of balls with two dots.

Right, @Andy_L is correct, x/n being the balls with a blue dot, and y/n the ones with a red dot. The joint probability of the two events (i.e., a ball having both dots) is the product of their probabilities, as the two events are independent, so y/n * x/n = (xy)/(n^2). Like all probabilities this will be a number in the range [0, 1], so you have to multiply it by 100 if you want it to become a percentage.

In case a specific example helps:

Suppose 40% of the balls are painted with a blue dot.
Then suppose 30% of the balls are painted with a red dot.

If the balls were well-mixed, and every ball had an equal chance of being chosen to be painted with a red dot, then 30% of the blue-dotted balls should be painted with a red dot.

That’s 30% of the 40%, which is 12%. (0.30 * 0.40 = 0.12).

Caveat: If you did this for real, there’s no guarantee that it would come out to be exactly 12% (since randomness is involved). In fact, it couldn’t be exactly 12% if the total number of balls n were such that 12% of n was not a whole number.

I think this is a similar approach to estimating the number of fish in a lake, or other animal counting estimation. They catch a number of fish on one day, tag them, and toss them back into the lake. Then they come back again and catch a number of fish again, and count the number that had already been tagged. From that they can estimate the total number of fish in the lake. I’ve never done this, but it is an interesting application of statistics.

Note that you can’t get an exact answer, because it’s random. For instance if you put dots on half the balls both for red and blue, you could end up choosing the same balls for red and blue, giving you half of them with both red and blue dots, or could end up choosing completely different balls, giving you none with red and blue dots.

If you did the same experiment billions of times, and averaged over all those trials, you’d average having xy/nn balls with both red and blue dots, but you can’t say you’ll get that on any one trial.

An observation:
It would seem to me that there are ranges of probabilities you could visualize. For instance, the question: How many balls are unspoiled, only white.
While all balls with red dots could have blue dots on them, at most balls with blue dots can have at most only 75% with red dots also. Conversely, it possible for any ball with a blue dot on it to have no red dot on it too, and any ball with red not to have blue. Obviously, any ball with two dots on it has to have one dot red and one blue. 30% to 60% could be white only.

I worked on a project in the early 1980s that did a similar thing to estimate populations of humpback whales in the waters around Alaska. When whales dive, they tend to stick their tails out of the water. Humpback whale tails have distinctive black-and-white patterns on them (see photos here), that are as unique as fingerprints.

The researchers had a massive collection of photos of these, and they kept taking more photos. By matching new photos with previous photos, they could treat those as capture/recapture events, and get population estimates.

Somewhat similar is the German tank problem. You assume enemy tanks have been manufactured with consecutive serial numbers starting at 1. You observe the serial numbers on a set of tanks selected randomly. Based on the observed serial numbers, estimate the total number of tanks manufactured.