Can somebody explain the two envelope paradox to me

If X is the amount in the envelope in your hand, then Y may be .5X or Y may be 2X. There are no other possible values for Y. Let’s assume the probabilities for these two values are both 50%.

So now switch. You now have Y in your hand. If Y is .5X, then by switching to X, you stand to gain .5X dollars. If Y is 2X, then by switching to X, you stand to lose X dollars. So your best bet is now not to switch.

Which still yields the strange result that no matter which you picked first, the best bet then was to switch. But we no longer have the even stranger result that your best bet is always to switch, no matter how many times you’ve switched already.

How to take care of the remaining strange result? By realizing we have no basis for assigning that 50/50 probability that we did to Y’s being either .5X or 2X.

I suspect once again that

I’m not at all clear on the topic myself.

No, what creates the paradox is the introduction of some further math along with the “expected value”.

Reality says otherwise.

If you would like to see for yourself, do an experiment. Put a value in an envelope and then double that in another, randomly choose an envelope, and then log whether the other envelope is 2x or .5x the envelope you chose. Repeat that a bunch of times.

I just did - my statement is accurate.

As long as “X” represents the chosen envelope (as I specifically sated in my post) then the statement is accurate.

Yes, there must be something wrong with the statement about “expected value”.

The way you formulate that part of the problem IS the problem.

In a matter of probability, you use whatever knowledge you have.

Treat it as a real-world problem for a moment, rather than as an abstract puzzle. Suppose your brother-in-law, as part of a drunken birthday party, actually hands you the two envelopes. You see $100 and must decide whether to switch.

To give yourself best winning chance, wouldn’t you take what you know about him into consideration? Is he a show-off who would put three C-notes into the envelopes, or too stingy for that? Without knowing such details, I can’t guess the chances, and 50%-50% might me as good a guess as any, but note that for any assumed pdf, the actual percentage is computable and will not in general be 50%.

You could get a similarly strange result even using a slightly different set-up with a genuine 50/50 probability. You could get a similarly strange result in the simplest set-up of all.

Suppose there is always one envelope with $1 and one envelope with $2, and you just flip a coin to determine which one is handed to you and called X, and which remaining one is called Y. That is, there’s a 50-50 chance between (X = $1, Y = $2) and (X = $2, Y = $1).

What’s the expected value of Y/X? Well, E[Y/X] = 50% * ($2/$1) + 50% * ($1/$2) = 1.25.

The expected value of Y/X is 1.25. But, symmetrically, the expected value of X/Y is also 1.25.

Crazy, huh? It seems as though it’s both true that X is expected to be bigger than Y and Y is expected to be bigger than X. But let’s not let the technical jargon “expected value” blind us to what’s really being said with that phrase. All this is saying is that the arithmetic (probabilistically-weighted) mean value of Y/X is 1.25, and the arithmetic mean value of X/Y is also 1.25. That’s ok; there’s no reason the arithmetic means of two reciprocal quantities have to be reciprocal.

How did you come to the conclusion that you should switch?

Remember (and I was very clear about this in my post) - “X” represents the value in the envelope you chose. This value is potentially different each time you run the experiment.

We don’t choose envelopes based on "2X’ or “.5X”, we choose them based on the actual value - which we have absolutely no way of knowing when the value is randomized each time.

Nobody is showing it’s false because it is true.

And we do know the probabilities as verified in real life. If you can run a substantial number of random trials that show my statement is false please post your results, I would be very interested.
The problem is not with making a statement like mine - the problem is how you use that information to create some “expected value” statement - THAT is where the problem is.

“How about paradoxically calculating E[Y - X | X] (the expected value of Y - X given the value of X, expressed as a function of X) instead?”, I hear you ask.

Well, that’s fine. And I’ll discuss this in my next post. But, first, I want to point out that, in general, E[A] is NOT equal to E[A | B = E**]. The notation is a bit hairy; what this is saying is that “the overall expected value of A is NOT necessarily equal to the expected value of A given that B is equal to the expected value of B”.

For example, suppose there are three equiprobable possibilities of (A = 0, B = 0), (A = 1, B = 1), and (A = 4, B = 2). What’s E[A | B]? Well, it’s B[sup]2[/sup]; no matter what B is, A is exactly equal to B[sup]2[/sup]. What’s E**? It’s 1, the arithmetic mean of 0, 1, and 2. What’s E[A | B = 1]? It’s 1; when B = E** = 1, then A is guaranteed to equal 1. But what’s the overall E[A]? It’s 5/3. E[A | B = E**] is NOT the same thing as E[A].

This.

There are two possible games you are playing, and you cannot average the possible results from two different games to get an “expected value”.

If you open a $40 envelope, you could be playing a $20-$40 dollar game, or a $40-$80 game.
If you switch your $40 envelope for an $80 envelope and ‘win’, you were playing the higher dollar game.
If you switch your $40 envelope for a $20 envelope and ‘lose’, you were playing the lower value game. The difference between the amount you win or lost is not determined by the envolope you chose, it was determined by the game you were playing.

The formula which gives you the 1.25x expected value is not appropriate to use in this situation.

How bizarre. Whenever I write E** with an uppercase B, it gets turned into a lowercase B.

Why not? Let’s take the envelopes out of it and set up the same numbers as a bet.

You have forty dollars. Somebody offers you a bet for your money. They’ll flip a coin. If it’s heads you win eighty dollars. If it’s tails you win twenty dollars. Should you take the bet?

Yes, because the expected value of the bet is fifty dollars. So the bet is worth more than the forty dollars you have.

The problem is that the probability of the second envelope containing 0.5X or 2X isn’t independent of the value of X. Hence, a simple linear calculation of expected value ((0.5 * 0.5X) + (0.5 * 2X)) gives an incorrect result.

This is more readily grasped in the case with a known upper bound. Say the smaller envelope contains any amount up to $100, with equal probability. (In other words, the smaller envelope is “uniformly distributed” between 0 and 100.) The amount in the larger envelope, it should be obvious, is uniformly distributed between 0 and 200.

Over many trials, half of all envelopes will be “small” envelopes, and half will be “large”. Of the small envelopes, half will be between 0 and 50, and all will be between 0 and 100. Of the large envelopes, half will be between 0 and 100, and half will be between 100 and 200.

Taking matters in reverse, all envelopes between 100 and 200 are larger envelopes. Of envelopes between 0 and 100 (three quarters of all envelopes), two-thirds are smaller envelopes and one-third are larger envelopes.

Now you hold your unopened envelope of unknown size. It may be larger or smaller, with equal probability. If X>100–which it will be 25% of the time, and you don’t yet know whether this is one of those times–then it is a larger envelope and the other envelope contains 0.5X. If X<100, then two-thirds of the time it is a smaller envelope, and the other envelope contains 2X. One-third of the time it is a larger envelope, and the other envelope contains 0.5X.

Again, the probabilities with respect to the other envelope (0.5X versus 2X) depend on the amount of X in the first envelope. And again, when that is the case, a linear multiplication of expected value won’t work. You must integrate over the probability distribution, which (if you do the math) gives you the correct expected value for the other envelope of 1X.

Now generalize to the case with an unknown upper bound. This is very difficult to grasp, because if someone tells you, “Pick a number–any number!” it isn’t obvious that you have an upper bound, or that you will be picking from any sort of distribution at all. But you must be. You can’t program a computer to program any number between zero and infinity, with equal probability. It can’t be done. Nor can you program your own mind to produce such a number. A number must come from a distribution, even if that distribution is very ambiguous (“I picked the first thing that came into my mind”) and difficult to quantify. And, when you double it or halve it, the opposite number comes from a different distribution,

Which means, again, that “larger” numbers are more likely to be associated with larger envelopes (and 0.5X opposites), and “smaller” numbers are more likely to be associated with smaller envelopes (and 2X opposites). So again, a simple linear calculation of expected value will lead you astray. You must integrate over the whole, nebulous, ambiguous distribution, and when you do, you get the unexciting expected value for the opposite envelope of 1X.

This is not the same game. I was actually going to use this example to show the difference but thought it might confuse matters.

In your game, you might win $80 or $20.
In the envelope game, one of those results is impossible. If the game is the 40-80 game, a $20 result is impossible. If the game is the 20-40 game, an $80 result is impossible.
The fact that you do not know which game you are playing does not change the fact that some results are impossible.

Focusing on “You can’t have a probability distribution on the integers in which every integer is equiprobable” is a red herring; this depends on a certain technical account of what a probability distribution is. We could instead, for example, choose to define the probability P(B | A), for sets of integers B and A, as the asymptotic density of B in A [that is, as the limit, as n goes to infinity, of P[sub]n[/sub](B[sub]n[/sub] | A[sub]n[/sub]), where P[sub]n[/sub] is the uniform distribution on the interval [-n, n], and X[sub]n[/sub] for any set X is the intersection of X with that interval].

Does this satisfy Kolmogorov’s axioms? No, although it is a finitely additive distribution, it nonetheless violates certain instances of countable additivity (including the one everyone keeps pointing out; every singleton has probability 0, though the set of all integers (which is the union of these disjoint singletons) has probability 1). But it’s still a perfectly cromulent thing in itself, and it models just fine one intuition about what ordinary language “probability” could mean, and in particular what ordinary language “pick an integer at random” means. So we can still study it and see what happens with it. It’s interesting to see how it does and doesn’t match up with Kolmogorov’s axioms, but Kolmogorov’s axioms aren’t the end-all, be-all of conceptualizations of probability.

I’m not seeing that. The guy in my game is only going to flip the coin once. Obviously it won’t come up both heads and tails so one of those results is impossible.

Suppose he flipped the coin into a box and neither one of you could see it land. And then he offered his bet. If you accepted the bet, the two of you would open the box and see what side the coin was laying on. In this case, like the envelope case, the outcome is predetermined and you’re just betting on your lack of knowledge of that outcome.

The auto-editor is boldly trying to interpret this as a tag, I think. Try messing it up with another tag : [****B]

It’s worse if you add an innocuous [/**b] later in the post - it’ll get paired up with that converted tag.

(I typed that using [[**b][/**b]B], though [****i] would work as well.)

I offer this resolution to the paradox for approval:

An integer g is picked at uniform random [on some account of uniform, but whatever]. Two envelopes are filled with 2[sup]g[/sup] and 2[sup]g+1[/sup], respectively. Then a coin is flipped according as to which one of these envelopes’s contents is called X while the other is called Y.

What is the apparent paradox?

[ul]
[li]The first apparent paradox is that E[Y | X] = 1.25X, but simultaneously, symmetrically, E[X | Y] = 1.25Y.[/li]
What’s so paradoxical about that? Well, for one thing, it implies that both E[Y/X] and E[X/Y] are 1.25; we might’ve expected these to be reciprocals instead. But this is no paradox; we saw above that the same thing happens in even the simplest of cases. In general, the expected value operator does not respect reciprocation.
[li]How about the fact that since both E[Y/X] and E[X/Y] are greater than 1, it is apparently implied that both “You should expect Y to be greater than X” and “You should expect X to be greater than Y”?[/li]
This amounts to a misunderstanding of what the technical term “expected value” means; it just refers to an arithmetic mean, and it is what it is. It needn’t imply anything about what one expects to happen, and there’s no use naively trying to maintain an expected picture of the world in which everything is equal to its expected value (as illustrated by all the usual things; a man with less than 2 legs is rather unexpected, the expected value of A * B is generally not the product of the expected values of A and B, etc.).
[li]Fine. But, returning to E[Y | X] = 1.25X and E[X | Y] = 1.25Y, there’s one more apparent bit of paradox: from this we can conclude E[Y] = E[E[Y | X]] = E[1.25X] = 1.25E, and symmetrically, E = 1.25E[Y]. Thus, we have that E = (1.25)[sup]2[/sup]E. Isn’t that paradoxical?[/li]
Well, it may seem odd, but there are of course solutions to that equation. It’s just that E (given that it is positive, as X is always positive) has to be positively infinite. Essentially, in a roundabout way, we’ve demonstrated that the expected value of 2[sup]g[/sup], for a uniformly random integer g, is infinitely large, because E[2[sup]g[/sup]] = E[2 * 2[sup]g - 1[/sup]] = 2E[2[sup]g-1[/sup]] = 2E[2[sup]g[/sup]]. So E and E[Y] are both ∞, and there is no contradiction in E = 1.25E[Y] and simultaneously E[Y] = 1.25X.
[li] But how about the fact that E[Y | X] = 1.25X apparently implies that E[Y - X | X] = 0.25 X, so that E[Y - X] = E[0.25X] = 0.25E > 0, while simultaneously, symmetrically, E[X - Y] > 0, which means we must also have E[Y - X] < 0?[/li]
The problem here is that E[Y - X] is not actually well-defined, in the sense that it involves the sum of a collection of numbers of differing signs, with infinitely large positive and negative components. Specifically, we have that E[Y - X] = 1/2 * E[Y - X | X < Y] + 1/2 * E[Y - X | X > Y] = (E[Y - X | X < Y] + E[Y - X | X > Y])/2 = (E[0.25X] + E[-0.25X])/2 = (E + -E)/8. The numerator there is E + -E; naively, this goes to 0, but when adding infinitely large sums to their negation, the sum can be re-arranged to go any which way one likes, including both to positive and to negative values.

In particular, letting 1/Z be the nominal probability of any particular integer value of g, we have that E = (… + 1/8 + 1/4 + 1/2 + 1 + 2 + 4 + 8 + …)/Z, so that E + -E = [(… + 1/8 + 1/4 + 1/2 + 1 + 2 + 4 + 8 + …) - (… + 1/8 + 1/4 + 1/2 + 1 + 2 + 4 + 8 + …)]/Z. The numerator there can be re-arranged many different ways to produce different sums; for example:
[LIST]
[li]As … + (1/4 - 1/4) + (1/2 - 1/2) + (1 - 1) + (2 - 2) + (4 - 4) + … = … 0 + 0 + 0 + 0 + 0 + …, which is zero[/li]
[li]As … + (1/4 - 1/8) + (1/2 - 1/4) + (1 - 1/2) + (2 - 1) + (4 - 2) + … = … + 1/8 + 1/4 + 1/2 + 1 + 2 + …, which is infinitely positive.[/li]
[li]As … + (1/4 - 1/2) + (1/2 - 1) + (1 - 2) + (2 - 4) + (4 - 8) + … = … - 1/4 - 1/2 - 1 - 2 - 4 - …, which is infinitely negative.[/li][/ul]

So the question of whether E[Y - X] is positive, negative, or zero is ambiguous in the same way that the question of whether that sum is positive, negative, or zero is ambiguous.

(That having been said, it is possible to give a natural account of infinite summation on which that sum, as originally set up in the calculation of E[Y - X], is simply considered to go to 0 (with the shifted versions going to different values no longer being considered the same sum), which I could outline in a future post if there is interest.)
[/LIST]

Ah, of course. I forgot all about that.

I think this highlights what some people are saying, which is that an “expected value” is only a guide for action in a particular way in particular situations – and the scenario described in the OP is either not one of those situations or is one in which “expected value” is being used in the wrong way.

In your example, I would have thought that expected value is being used the wrong way. But if the choice were between taking “X/Y dollars” or “Y/X” dollars, then I’d think it would be rational to use expected value to make your decision–except of course using expected value here just shows that (since the expected values are equal) you’ve got no good evidence either way.

I’m not sure what you are referring to with this. To clarify, I would not want to gloss E[Y/X] = 1.25 with the phrase “Y is expected to be bigger than X”. I would consider that an erroneous interpretation of the misguidingly named “expected value” operator, and was using it as an illustration of such. (However, it is definitely true that E[Y/X] = 1.25; there’s nothing wrong about that)

The problem here is that there’s a distinction between the way you’d read this problem in ordinary language, and how you have to read it in order to get a genuine probabilistic model out of it. You can’t really make a random variable X that behaves exactly the way that you’re describing in this post, and that’s why there appears to be a paradox.

In the case where you don’t open the envelope before you decide to switch, C K Dexter Haven’s analysis is exactly correct, and can be translated into something consistent. In the case where you do, you can follow the Bayesian approach that seems to be popular in this thread–although that opens a giant can of worms regarding prior selection–or you can take Indistinguishable’s conditional expectation approach, which I find a lot more appealing. Either way, the paradox disappears when you write down something that actually corresponds to a formal model.

If I can get on my soapbox for a minute, there is a school of thought that says that probability should be handled by intuition, and that doing formal calculations is pointless. I think this is a great example of why you’d better make sure your intuition is damn good before you take that approach.