The calculation which shows that you take Y instead of X in the two envelope problem is an expected value calculation. What is it? 0.5X/2 + 0.52X = 1.25X. It is assumed that this gives E(Y). Why? 0.5 is a probability, but the probability of what? The definition of E(Y) is the sum of P(Y=y)*y for all possible y. We must therefore have the probabilities, P(Y = X/2), and P(Y=2X). These are assumed to be 0.5, and 0.5. Why? Is it because you could have picked the larger of the two envelopes or the smaller with a 50:50 chance? This is covered by the random variable U. Let’s look at X and Y in terms of Z and U again.
X = Z + UZ
Y = 2Z - UZ
Calculating E(Y) from this gives E(Y) = 1.5E(Z). We also have E(X) = 1.5E(Z), which is equal to E(Y). How can we get E(Y) = 1.25X? We want it in terms of X, not E(X). To do this we have to use “the value that X takes”, rather than treating X as a random variable with a value to be decided. To do this we have to make X “something”. We can do this by writing things in terms of that X, and by conditioning every probability on that value of X. We write things like that using P(Y|X) and E(A|B) and is called conditional probability or expectation. We want to find E(Y|X) in terms of X, and see if it is 1.25X.
X = Z + UZ
Y = 2Z - UZ
X + Y = 3Z
Lets take the expected value of the equations. Note that we are assuming X is now constant not random, so E(X) = X
X = E(Z|X) + E(UZ|X)
E(Y|X) = 2E(Z|X) - E(UZ|X)
E(Y|X) = 3E(Z|X) - X
Do any of them show that E(Y|X) = 1.25X? No. We do have that E(Y|X) = 3E(Z|X) - X. What does E(Z|X) mean? It’s the expected value of the lower of the two envelopes, given that you know the value of one envelope. Now, if we know that one envelope contains X, the other envelope must contain either X/2 or 2X. The lower envelope in these cases is X/2 and X. From this we can expand E(Z|X) into P(Z = X/2|X)*X/2 + P(Z = X|X)*X. Now, are those probabilities 0.5 and 0.5? Why should they be? We know that U acts like that. Maybe those probabilities are just U. Let’s try to get them in terms of U.
Bayes’ theorem is extremely useful when working with conditional probabilities. It states that P(A|B) = P(B|A)*P(A)/P(B). In our case, we know X in terms of Z very well, but it is a bit harder to work with Z in terms of X. Let’s apply Bayes’ theorem to convert things into a more usable form.
P(Z|X) = P(X|Z)*P(Z)/P(X)
To make things a bit clearer later on, I’ll use the fact that X is something by stating X = c, which is the “something” we were assuming X was.
P(Z = X = c|X=c) = P(X = c | Z = c)*P(Z = c)/P(X = c) (getting rid of superfluous equalities)
What is P(X = c| Z = c)? It is saying “what is the probability that X would be something, knowing that Z is that value”. Since we know that X = Z + UZ, it is true whenever U is 0, which happens with 50% probability. Therefore we can say that P(X = c| Z = c) = 0.5.
Another term is P(X = c). What is this? The probability that X is c. How can X be c? Either Z was c and X was the smaller of the two envelopes or Z was c/2 and X was the larger of the envelopes. This involves Z and U. U is independent of Z, so this can be written out as P(Z = c)*P(U = 0) + P(Z = c/2)*P(U = 1). P(U = 1) = P(U = 0.5), so we can write this as 0.5P(Z = c) + 0.5P(Z = c/2).
Finally, we have P(Z = c). This is a simple statement of Z’s distribution. It is the probability that Z is c. We don’t know it, so we have to leave it as is.
We can therefore rewrite P(Z = X|X), saying that X = c, as P(Z=c)/(P(Z=c)+P(Z=c/2)). (Dividing everything by 0.5). We can likewise rewrite P(Z = X/2|X) as P(Z=c/2)/(P(Z=c) + P(Z=c/2)). What does this mean? After some calculations E(Y|X=c) = c*(2P(Z = c) + 0.5P(Z=c/2))/(P(Z=c) + P(Z=c/2)). If P(Z = c) = a and P(Z= c/2) = b then E(Y|X=c) = X*(2a+0.5b)/(a+b). The probabilities we need to find E(Y|X) are of the type P(Z) only. Whoops. We said we don’t know the distribution of Z. Z could be any distribution, and the answer for E(Y|X) changes depending on what it is.
Wait a minute, what about that often repeated assumption, that we can assume things are equal if we have no reason not to? We assume that a coin is 1:1. We assume that a 6 side die has 1/6 probability of being 1. Why not assume that P(Z=c) = P(Z = c/2)? (Note that we are making it true for all c). If we do, we get E(Y|X) = 1.25X. Why not, you may ask. I’ll answer why. Not many distributions have P(Z=c) = P(Z = c/2). Look at our die. Let c be the number that the dice gives. If c is 2, then it works perfectly. P(2) = P(1). However if c = 1, then we need P(0.5), which is 0. Whoops. We can’t have odd c. Our assumption is false. But let’s ignore that. Let’s just assume that a die isn’t representative. Let’s say that our “everything is equal” assumption is true, and that Z is a distribution that has it. Surely plenty of distributions have it! Well, I’ll give you the only one I can think of.
Z = 0
What does that mean? It means the envelopes are empty. Then yes, it means that you can expect the other envelope to have 1.25 times what you have. Because both are empty. 1.25 * 0 = 0.
Now let’s use the example. X = 1. Is E(Y|X=1) = 1.25? Sure! Because we have a paradox. We have money in an empty envelope. If Z is 0, then X is 0. It can’t be 1.
So we have a choice. We can assume that Z fits a certain distribution, which happens to make Z=0, or we don’t. If we don’t, we can’t do E(Y|X). Well, that wasn’t all the options. There is also the option that Z is a continuous random variable. I suggest this is silly for money in envelopes, but do it if you want. I won’t, since I’m not a mathematician and doing things in terms of continuous functions with the associated integrals is beyond my current knowledge. If you can show that my general logic is incorrect for integrable continuous random variables I’d be happy to be shown wrong. There is another option, that E(Z) is undefined. In that case, E(Y) = 1.25E(X) is kinda true. It’s also “true” that E(Y) = 0.000000000001E(X). This is possible with non integrable functions. Again, I will suggest that this is a bit silly for money in envelopes.
What does this all mean? It means if you open an envelope, and get $1, you can’t say “I have equal chance of getting $2 and $0.5 if I swap” if you don’t know what the relative probabilities that the envelopes have ($1,$2) vs ($0.5,$1) . You have no idea what the probabilities are. I certainly don’t. Remember that these probabilities are purely “what was put into the envelopes to begin with”, nothing to do with you picking the larger or smaller. If you assume those probabilities are equal no matter what you got then you are either assuming that the envelopes were empty (and that getting $1 is impossible), or that the money is a weird continuous random variable that has no finite expected value . Neither really fits the fact that you have a $1 in the envelope you opened.
Of course, I’m assuming that you have no idea what the initial money in envelope distribution is, apart from discrete (and probably not zero with 100% probability). What happens if you do have some knowledge? For example, let say you are looking at an envelope with $1 in it, and a fly on the wall flies down and whispers in your ear “I’ve been watching him do this for a long time. He puts $1 and $2 in the envelopes as often as he puts in $1 and $0.50”. If you believe the fly, then you should take the other envelope. Why? Because he is saying that P(Z =0.5) = P(Z=1). E(Y) = 1.25. Great! What if the fly instead said “He only ever puts in whole dollars”. Great! P(Z=0.5) = 0. In this case, E(Y) isn’t 1.25. It’s 2. If the fly said “He always puts a half a dollar in one of the envelopes” it means that P(Z=1) = 0 and E(Y) = 0.5. Don’t swap. He could even say “P(Z=0.5) = 0.00034 and P(Z=1) = 0.00029”, in which case it’s a slightly harder calculation, E(Y) is roughly 1.19 and you should swap.