Question related to the "urn problem" -- please check my intuition

[There ended up being more information in here than is strictly needed for the main issue of what your friend is going on about because I started writing before processing all the posts up-thread.]

Some of the questions here have unambiguous answers, but others are deep enough that they require crisp definitions and assumptions first. In particular, some of the questions have different interpretations depending on if you define probabilities in a frequentist or Bayesian framework.

A Bayesian phrasing of the problem:
(1) The probability p of drawing a black bean is a random variable (or rather, behaves mathematically like one), and we can talk about the probability density (or probability distribution) for p. Call this f(p).
(2) Before making any measurements / drawing any beans, we have some prior knowledge that we encapsulate in a “prior probability density function” that we could label f[sub]prior/sub.
(3) After making measurements, we can use Bayes’ theorem to update our knowledge from f[sub]prior/sub to f[sub]posterior/sub.

When you really do have a definable prior probability density (sometimes shortened to just “a prior”), this is all well-defined. (See Note 1.) As the discussion continues, I’m happy to get into the nuances of defining the prior when you don’t really know anything (or, usually, when you don’t want to know anything). For now, I’ll just mention that these “uninformed” cases do not have a single solution.

A frequentist phrasing of the problem:
(1) The true probability p of drawing a black bean is a fixed, unknown constant. It makes no sense to talk about a probability density for p.
(2) One can only give statements about the probability of certain experimental outcomes, often as a function of p.
(3) A reported confidence interval is a range of values that has a known probability of containing the true value of p (which itself is an unknown fixed constant).

A 90% confidence interval in frequentist statistics means that the interval contains the true answer with a 90% probability. There is a 10% chance that the true answer is outside the interval, and the interval doesn’t imply anything about how far outside it might be. The final answer is just a statement about the experiment’s setup and its calculable statistical behavior in the face of possible values for p.

The frequentist definitions work fine in an uninformed case, but the man-in-the-street is usually secretly thinking in terms of Bayesian probability densitys. Since the words and quantitative results are similar in many scenarios, flip-flopping between frameworks in a single sentence is common. Sometimes this happens because the jargon is more convenient (if imprecise) that way. Sometimes it happens mistakenly. And sometimes it leads to incorrect inferences.

So, getting to some actual thread content –

The definitions of “success” and “failure” are switched between the two threads. If you draw a string of black beans only, then you can give a 90% confidence level upper limit on 1-p, or equivalently a 90% confidence level lower limit on p. These limits are frequentist.

Right. You cannot say that the next bean has a 9/10 chance of being black or anything like that. You can only say that the probability p has a 90% chance of being above the 90% confidence level (C.L.) lower limit. On the flip side, you can also say that it is unlikely that the next bean will be white (using “white” to mean “non-black”) since you have a 90% C.L. upper limit on 1-p.

It’s the same situation philosophically, although the math under the hood might change a little. One practical change is that the issue of defining an uninformed prior for the Bayesian framework looks more tractable (even though it isn’t). In the extreme case, consider knowing that there are exactly 4 beans and you are drawing with replacement. You could enumerate all possibilites (0 or 1 or 2 or 3 or 4 beans are black and the others are white), and you could say that each of these scenarios has equal prior probability. The Bayesian approach would then happily give you a real, honest-to-God answer for “What is the probability that the next bean I draw is black?” But, that weighting of the five possibilities is arbitrary and not based on your truw knowledge. After all, why not include the combinatorial factors in the weighting, making the (1,3) case become four times more likely, a priori, than either the (0,4) and (4,0) cases? Without actual prior knowledge that you really want to inject, these sorts of priors can quickly become unjustified crutches to allow one to get to a Bayesian style answer.

This is tied up with the preceding parts of this post. Feel free to re-pose this question if it’s still hanging.

The rule of succession injects an assumption that needn’t be there. Namely, it assumes that you know that both successes and failures are possible, and that you know this in a very particular way. With the rule of succession, one assumes that you’ve seen exactly enough evidence to know that both success and failure are possible. This amounts to starting the experiment with one success and one failure already under your belt. There are other ways you could know that both success and failure are possible (e.g., you could be told this, but not given probabilities for each). So, your friend is assuming Laplace’s rule, but that’s one of an infinite number of ways you might know that both success and failure are possible. Injecting a specific prior does allow you to give a posterior probability (in this case, 0.917), but that posterior probability is only as good as the prior (GIGO). Naturally, this is all in a Bayesian framework.

One way to emphasize the relevance of GIGO to your friend is to suggest a hypoothetical wager. You would come to him with an urn, and you would truthfully tell him that there are 1000 beans, and that both success (black) and failure (white) are possible. You’d allow him to draw ten beans, and if any are white the game is over. If he draws ten black beans, however, he would make a wager on the next bean being white. He should be assigning a probability 0.08 that the next bean will be white, so he’d be willing to take, say, 99-to-1 on it. Obviously if you’ve prepared the urn with 999 black beans and 1 white bean, he’ll be losing money with his apparently good wager.

Note 1: An urn example with a well defined prior: I show you 1000 urns with 1000 beans. One urn has one black bean. The next has two, then three, and so on up to the last urn which has 1000 black beans. The other beans in the urns are all white. I choose an urn uniformly at random, and then you draw ten beans from the urn, all of which are black. If I start asking questions about the probability of drawing another black bean from that selected urn, you have a crisp prior to work from, and the math just works it all out, and it all matches intuition. The trouble comes when making the leap to an “uninformed” situation and wishing all the niceness would make the leap, too. (It doesn’t.)

A = undefined.
B = undefined.
Given your assumption that “you don’t know anything about what kinds of beans could be in the jar”, you can’t know the probability of drawing a black bean or having all black beans. You can set (frequentist) limits on these probabilities, though. To set Bayesian limits or to calculate Bayesian probabilities, you would have to alter your assumption that “you don’t know anything” going in. (This is what your friend is implicitly doing by invoking Lapace’s rule. He assumes a very specific, arbitrary set of prior knowledge.)

I don’t have time now to comment on the whole thing again, but I do want to comment on this common misconception. I assume you say (1/2)^990 since after all there are two possibilities black and not black so it’s a 50-50 proposition 990 times. To see why this is wrong, it would be equally correct to say there are three possibilities, black, blue and neither, so the answers are (1/3)^990 and 1/3.

Just because there are n possibilities it doesn’t make the probability of each one 1/n unless you have some reason to believe the possibilities are equally likely. That is basically where Bayesian statistics comes in, you must assume prior knowledge embedded in what’s called the prior distribution.

My understanding of the problem is this:

Assuming you know how many beans are in the jar (as was mentioned in the original post)
you should be able to solve it. In this problem, the colours of the beans don’t matter beyond “black” and “not black”.

Let X represent the number of black beans in the jar initially. We don’t know this number.
Let N represent the total number of beans in the jar initially.

The probability that the first bean drawn is black is equal to X / N.

The probability that the first AND second beans are black is (X / N) * ( X-1 / N-1), and the probability that successive beans will all be black follows this pattern. Thus, the probability that the first 12 drawn beans will be black is:

(X! / (X-12)!) / (N! - (N-12)!)

The possible values for X are between 12 and N, with “N” being the only option that satisfies the question “Are all of them black?”

Now what we do is plug in these values to the probability we calculated earlier.

For X = N, the calculated probability is 1

What we need to calculate is the probability that the value of X is N out of all possible values of X. In this case, it’s 1 [ie when X = N] divided by the sum of the probabilities for all values of X from 12 to N (including N).

This is the probability that all of the beans are black.

No, yeah, I realized that I’d screwed that up a little later… I was under a kind of residual effect from a hypothetical I’d been considering. :wink: Corrected it in another conversation I’m having elsewhere but forgot I’d done it here.

I don’t remember the calculus to solve this problem, but I am able to describe a solution. Make a graph where the X-axis is the percentage of black beans in the jar, from 0% - 100%. The Y-axis is the probability of selecting 10 out of 10 black beans for that percentage on the X-axis. So there’s a 0% chance of having 0% of black beans in the jar – you know this because you drew 10 black beans. On the other side of your X-axis, if there are 100% black beans in the jar, you’re chance of drawing 10 out of 10 black beans is 100%. For all the other values on the X-axis, you’re Y-axis value is that X-axis percentage (in-decimal) raised to the power of 10. So if the jar was 20% black beans, your odds of selecting 10 out of 10 black beans were 0.2^10 or 0.0000001024. If the jar was 80% black beans the odds would be 0.8^10 or 0.1073741824. 99% black beans, odds = 0.904382075008804. What you’re drawing on your graph is a curve. In statistics, it’s called a confidence curve. So you’re 90.44% confident that the jar contains at least 99% black beans.

Using calculus, you can calculate the area under a curve. You’re total area is going to be 1 (100% times 100%). The probability of the jar being 100% black beans, based on the fact that you drew 10 out of ten black beans, is the area under the curve (technically, it’s the area under the curve divided by the total area, but see last sentence.)

Pasta, I really appreciate the posts you’ve made in this thread: you’ve really helped me with this issue.
If anybody is interested, the economist Brad DeLong discusses the Bayesianism, frequentism and the urn problem in the context of electoral prediction. He does it in the form of a mock Socratic dialogue, to contrast a few possible points of view. http://delong.typepad.com/sdj/2014/08/tykhes-nonexistent-urn-and-senate-election-probabilities-over-at-equitable-growth-philosophy-of-probability-iii-the-philo.html

Anyway, my point is that this stuff has wide application and is worthy of reflection.