2 envelopes problem revisited

Here’s your problem (partly pointed out by others already): You cannot have 2X and (X/2) in the left-hand side of your equation. The amount in the chosen envelop (call it Env1) cannot contain both the smaller and the larger amounts in the envelopes. Here’s a correct way of finding the p that you want. Let the two amounts be X and 2X. Choose one of the envelopes (call it Env1) at random. Then the expected value in Env1 is clearly 1.5X. The expected value in the other envelope, Env2, is p(X) + (1-p)(2X) or p(2X)+(1-p)(X) (choose either one, does not matter). Set either one equal to 1.5X and you’ll arrive at p=1/2, as expected.

I can’t make any sense of this argument that X is being used in two different ways. It’s not. In Saffer’s posts, X does not mean the minimum or maximum amount, X is the amount in your envelope, regardless of whether that is more or less than the amount in the other envelope.

Yes, but you’re not just talking about some theoretical infinite limit. You’re creating a situation with a finite limit that’s low enough it affects the outcome of the situation. There’s a big gap between infinity and 500.

Sure you can. If you set X as the amount in the right hand envelope (and there’s no reason you can’t do this) then the amount in the left hand envelope in 2X or X/2.

There’s an equally big gap between infinity and 100 trillion. Any finite limit is going to resolve the paradox in exactly the same way.

But 500 was a factor if you open the first envelope and find $400 in it. You now know that other envelope can’t contain $800 so it must contain $200. (And this is an example Saffer gave.) You don’t have to raise the finite limit to infinity or 100 trillion - just raise it to 1000 and you’ve eliminated that out. Or just don’t tell the person making the choice what your finite limit is - he wouldn’t know there was a limit of 500 unless he was told. Then he would open the envelope containing $400 and have no reason to think it wasn’t possible for the other envelope to contain $800.

As I said, you may be justified in saying there is an upper limit at some point. But fixing that upper limit, telling the player what the upper limit is, and having the amount in the first envelope be more than half of the upper limit are all new factors that didn’t exist in the original paradox. If you add these factors in, you’re working on a different question.

The paradox is this: Conditioned on any particular value for your envelope, the expected profit from switching is positive, which leads us to say that unconditionally, the expected profit from switching is positive. Conditioned on any particular value for the other envelope, the expected profit from switching is negative, which leads us to say that unconditionally, the expected profit from switching is negative. And conditioned on any particular unordered pair of monetary values for the two envelopes, the expected profit from switching is zero, which leads us to say that unconditionally, the expected profit from switching is zero.

And the explanation is this: Yup. That’s right. The sign of the expected profit (equivalently, the sign of the total profit over all cases) changes based on how you calculate it; it is given by a conditionally convergent summation. If a summation has infinitely many positive and negative terms, it is indeed possible for its value to change based on how you group the calculation, as you have just shown with this example.

It’s the same phenomenon as that 1 - 1 + 1 - 1 + 1 - 1 + … = (1 - 1) + (1 - 1) + (1 - 1) + … = 0 + 0 + 0 + … = 0, while also 1 - 1 + 1 - 1 + 1 - 1 + … = 1 + (-1 + 1) + (-1 + 1) + (-1 + 1) + … = -1 + 0 + 0 + 0 + … = -1, and also 1 - 1 + 1 - 1 + 1 - 1 + … = 1 - (1 - 1 + 1 - 1 + …), and thus must equal 1/2.

Any finite truncation “resolves” the paradox because a finite summation cannot give two different values based on how you group it. But this is essentially to ignore the actual underlying phenomenon (after all, there is more than one way to truncate; we can specify a lowest and highest value altogether (giving an expected profit of zero by switching), a lowest and highest value for your envelope alone (giving a positive expected profit by switching), or a lowest and highest value for the other envelope alone (giving a negative expected profit by switching)).

Slight negation sign typo corrected in the bold part

I think most of the answers are misleading and/or overly complex and/or miss the point of OP’s question.

To demonstrate this, consider a simplified problem: You know the amounts in the two envelopes are $1 and $2 exactly, but you haven’t opened either envelope.

Note that the “paradox” continues, despite that there’s only a single case, no infinities, no ambiguity about distributions. You know each envelope has $1.50 on average, yet still get the paradox. E(other) = 5/4 * E(this).

The fallacy is simple, though I disremember the proper technical terminology. You assume
p(Y=2X) = 1/2 independent of X
but this is not the case.

Right, if you know the amount in the two envelopes are $1 and $2, you don’t have p(Y = 2X) = 1/2 independent of X. This is a (very) finite truncation where you’ve set a lower and higher bound for the monetary values possible.

But the whole point of the paradox, as I see it, is to consider what would happen if you DID have p(Y = 2X) = p(X = 2Y) = 1/2 independently of X. And in that case, all the “paradoxical reasoning” is flawless; you just happen to have a conditionally convergent summation.

[Here’s another simplified problem: You know the value in the other envelope is $1, but you haven’t opened either envelope.

In this case, you don’t have p(Y = 2X) = 1/2 independent of X either, but the average profit from switching is negative.

Here’s another simplified problem: You know the value in your envelope is $1, but you haven’t opened either envelope.

In this case, you don’t have P(Y = 2X) = 1/2 independent of X either, but the average profit from switching is positive.

Picking finite truncations at random doesn’t resolve the problem, since you can truncate it in different ways, and it’s not about the finite truncations anyway.]

Can you show me X and Y that has that, because I have a hard time seeing it.

That’s not a property of a particular X and Y. That’s a property of the distribution of (X, Y) values we assume.

Just imagine equal weights placed at each of the infinitely many points (2^a, 2^b) where a and b are adjacent integers.

For any particular X value, there will be two possible Y values, with equal weights on them; one with Y being twice X, and one with Y being half X.

[To pedantic mathematicians: Yes, yes, this cannot be a Kolmogorov-style probability distribution, but why should we restrict ourselves to that particular formalization of the intuition of weight distributions? In the same way in which it is reasonable and common to say “A random integer has probability 1/2 of being even. Two random integers have probability 6/π[sup]2[/sup] of being coprime.”, using a non-Kolmogorov notion of a uniform distribution over the integers, so it is reasonable to consider the above distribution of weights as legitimate.]

How is that a probability distribution? I can’t see how you can have a non-zero weight.

The weights of each particular point are infinitesimal in ratio to the total weight, just as in the uniform distribution on [0, 1].

You might object “But there are no infinitesimals other than zero, and a bunch of zeros can’t add up to unity as they have to!”.

Well, you wouldn’t raise that objection for the uniform distribution on [0, 1]. You are only raising it here because you have been trained to say that for countable domains, yet refrain from demanding it for uncountable domains (by familiarity with the Kolmogorov formalization of distributions).

But what’s so special about the Kolmogorov formalization? It’s not the end-all, be-all.

As an analogy, it is perfectly common and reasonable to say such things as “A random integer has probability 1/2 of being even.”, “Two random integers have probability 6/π[sup]2[/sup] of being coprime.”, etc., formalizing the naive intuition of a uniform distribution of integers via asymptotic frequency. But this would also be a non-Kolmogorov distribution! One might equally say “What is the weight of a particular integer? It would have to be zero, and yet they can’t all be zero!” here, yet nonetheless, it’s a kind of formalization of the intuition of the ordinary-language concept of “distribution” it is perfectly reasonable to consider as well.

The two-envelope paradox is about a particular intuition, and I propose we deal with the intuition head-on, rather than wave it away as not matching some particular formalization. If need be, fine, it’s not about “distributions”, it’s about “blistributions”. Now let’s see what’s going on in the “blistribution” it asks us to consider…

Don’t even think about probability distributions; that tends to distract. Just imagine actual, solid brass, five-pound weights, with one placed at each of the infinitely many points (2^a, 2^b) where a and b are adjacent integers.

Then, just as I said: For any particular X value, there will be two possible Y values, with equal weights on them; one with Y being twice X, and one with Y being half X.

What a silly objection it would be to say “You can’t place weights like that! Someone once chose to study a formalization of weight distributions which did not include examples such as this.”

Getting rid σ-additivity is non-trivial, especially for people with no education in real analysis (like me and I assume most of the posters in this thread)

But that someone studied a formalization that does allow placing weights in that way. Such a placement is a measure. But it cannot be made into a probability measure.

I’m not objecting to the placement of the weights per se, but to the notion that there is a perfectly inscrutable method of selecting one of those weights.

On one technical account of what a probability measure is. But we could still try to reason about it as a probability measure, on some more general account of what a probability measure could be (much like asymptotic frequency can be considered a generalized probability measure). The paradox isn’t really about the difficulty in describing the normalization factor.

There is the observation that E[X | Y] = 1.25Y so E = 1.25E[Y], while simultaneously E[Y | X] = 1.25X so E[Y] = 1.25E. This seems paradoxical, but can be explained by E and E[Y] both being infinite. Of course, there is the potential of indeterminacy issues when subtracting infinite quantities from other infinite quantities, and we see precisely that with the conditional convergence when one tries to calculate E[Y - X].

It’s not a matter of “getting rid of σ-additivity”, or alternatively defining a particular extension of the reals with infinitesimal probabilities, or any such thing; it’s a matter of not prematurely formalizing in the first place. There’s nothing to get rid of. We’ll just go where the problem takes us, and make what observations arise as relevant along the way. The problem never said anything about σ-additivity, or, for that matter, “real” numbers (in the jargon sense involving Archimedeanness), and if, in our exploration of the relevant mechanics of the problem, they never come up, then neither need we.

The relevant observations are simply that “E[Y | X] = 1.25X, so E[Y] = 1.25E” amounts to tackling these squares column-wise, “E[X | Y] = 1.25Y, so E = 1.25E[Y]” amounts to tackling them row-wise, and “E = E[Y] by symmetry” amounts to tackling them diagonally, and, yes, the squares’ values can change as one analyzes them with different groupings, because, just look, don’t think but look, that is indeed what happens on these squares (most clearly illustrated in the last one). It seems paradoxical because we’re all very used to regrouping making no difference in finite cases, but it can in infinite cases, as this very example shows, and what more is there to say?

All of which can be perfectly well understood without any knowledge of real analysis or probability or even any math beyond grade-school arithmetic. I keep mentioning “conditionally convergent series”, but one needn’t have ever heard of them, either (and it’s not really convergence, as such, which is the issue; just the idea of re-arranging to a different value). To look at the squares is to readily rediscover for oneself all the relevant phenomena.

So your answer to the question is, if I may paraphrase, “ignore probability theory, it only gets in the way, whats really important is that there is an infinite sum involved, even though the question was about the value of two envelopes”