Can somebody explain the two envelope paradox to me

No. X’ is the amount in this envelope if it’s smaller than the other, X’’ this envelope if it’s larger than other.

What confuses is to distinguish X, X’, X’’ rather than assuming, as the fallacy does, that X = X’ = X’’.
The reason is that X, X’, X’’ are expected values and such expectations are subject to conditioning.

I’m a little surprised OP continues to ask for help. Unnecessary obfuscations may have contributed to his confusion.

Surely OP will be happy to stipulate that there only a finite number of possible envelope money amounts. This stipulation avoids any unusual measuring paradox, if one is otherwise possible.

I’m curious whether anyone (except Little Nemo!) disagrees with the following:

septimus’s resolution of the paradox is correct, clear, and easy to understand.

Ok, I’ll say this again without symbols

The answer to the problem is that in the scenario provided one of the following must be true, all of which naively seem absurd:

  1. The expected value of the envelope is infinite or undefined, and thus there is nothing inherently wrong with each envelope having a larger expected value than the other.
  2. The probability of the other envelope having the larger amount is not actually 50%, voiding the expected value argument.
  3. OpalCat has a third envelope.

The reason that one of these things must be true requires understanding the actual probabilities involved. You might as well ask why a the positively charged nucleus doesn’t just pull the electrons into itself; the basic answer is “quantum mechanics”, and any more detailed answer requires at least some understanding of the forces at work.

Indistinguishable also provided a very nice diagram of what happens in the first scenario. Trying to determine the expected value of the envelopes under all the assumptions the problem makes leads to strange things happening.

If you don’t get it still, consider the following: Do you understand why all the assumptions in the problem lead to the conclusion that all possible positive values are equally likely? Given that, consider the expected value of one envelope in isolation. The only sensible ways to define an expected value for such a distribution make it out to be infinite; what finite value could it possibly be? If the expected value is infinite, do you understand why you can’t work with it like you would any finite number?

Most likely, it’s the first one of those questions that is the sticking point, and the only way to understand that is to understand the probability theory behind it. The problem is relying on one’s naive probability theory, especially when it comes to conditional probabilities.

Here’s a couple examples about poor naive probability:

Let’s say you are tested for a rare disease. .05% of all people, 1 out of every 2000 people have it. If you have the disease, it will always be detected. If you don’t have the disease, there’s a 1% chance it will be detected anyway. If your test comes back positive, what’s the chance you have the disease? Most people, even doctors, will fail to realize that because the disease is so rare, it will be much, much more likely that any positive result is due to a bad test reading than due to the disease actually being there. Out of any 2000 people at random, 1 will have the disease, and 20 others will get false positives. Thus there’s a 20/21 chance that if you get a positive result, you’re not actually infected.

I read once about a woman convicted of killing her third child after it found dead due to SIDS or crib death or something else unexplained - and her previous two died under similar circumstances. The prosecutors said that it was so unlikely that someone would have 3 children die in this way that she must be killing them. When probability experts heard about this, they were absolutely enraged. While it is incredibly unlikely for a parent to have 3 children die in this manner, what’s the overall probability that a parent has multiple children die (regardless of the reason) has a third child and kills it intentionally? There was absolutely no evidence to believe that the parent killed her child other than the probability argument. If the prosecution had something to go on that indicated that the woman was more likely than the average person to have killed the children, they might have had a case.

Since it’s possible that crib death can happen, it’s possible that it happens to someone 3 times. Do we automatically convict the parents based on only the fact that they were incredibly unlucky? One might as well say that whoever wins the lottery must have cheated since it was so incredibly unlikely for that person to win, without considering the fact that there were so many entries, someone was likely to win. The flip side is partly the reason why employees and family members are often excluded from random drawings; their relationship with the host makes it more likely than normal that they will have benefited from something shady going on and the hosts don’t want to have the appearance of impropriety.

The crux of the matter is that naive probability does not always work. Humans only evolved to make certain probabilistic assessments instinctively, and very often will be lead to draw the wrong conclusion by not carefully considering the matter at hand.

It occurs to me that some of you may have drawn conclusions about the problem, which neither I nor, IMO, Little Nemo drew.

I am certain that my explanation of the flaw in Nemo’s Paradox is correct, given an ordinary interpretation. glowacks, my question wasn’t rhetorical. If you claim my answer is incorrect or unclear, I want to know.

This is where you lose me. All three statements are false, in the commonsense model, which is mine and Nemo’s. Can you point to a phrase in OP itself which supports this “one of three must be true”?

I still don’t get it. What assumptions are you talking about?

glowacks, this is the expected value of the other envelope:
(M+2M)/2

Period.

I’m with septimus on this, no idea why you think one of those 3 statements must be true.

The only complicating factor is when you try to write the expected value in terms of “X”. If you avoid doing that (which has very little if any real world value) then all is good.

And the expected value of your current envelope is also (M+2M)/2, right?

Meaning expected values don’t give you the kind of guidance the OP expected them to, right?

I think that what I think that RaftPeople is saying about M is the same as my original point, which I now think was a good point again.

If you call the value of the envelope in your hand X, then sure it’s true that the other envelope might have 2X in it or it might have X/2 in it. But it’s not a good idea to take the average of 2X and X/2 as guidance concerning whether you should switch. Because if the other envelope has 2X in it, you’ve got M in your hands which means the other envelope has 2M, while if the other envelope has X/2 in it, you’ve got 2M in your hands which means the other envelope has M in it. The 'X’s in ‘2X’ and ‘X/2’ stands for two different values–M in one case, 2M in the other. So averaging the expressions “2X” and “X/2” does not give you what one would intuitively call an “expected value” for the other envelope. For it to give what one would intuitively call an “expected value,” ‘X’ would need to stand for the same quantity each time it is used to describe a single trial.

Gah why can’t I say it more clearly? Sorry…

Indistinguishable’s point shouldn’t be lost: There’s nothing magical about expected value. It’s just another word for weighted average. There’s nothing rationally compelling you to make the choice that has the greated expected value or anything like that.

Still, I’d say that’s not the most relevant thing to say about this situation. It’s true that there’s nothing rationally compelling me to use weighted averages to decide what choice to make in any arbitrary case. Nevertheless, there do seem to be cases in which weighted averages can be used that way productively, and cases in which weighted averages can’t be so used. I think the issue in this thread is that some people want to use a weighted average for this purpose, in a case in which, in fact, that weighted average is not, in the context of the problem, the kind of thing to use for this purpose. So the thing to get clear about isn’t just “you don’t always have to choose the thing with the highest weighted average” but rather “this is a case in which weighted averages don’t help you the way you’re used to, and here’s why.”

Based on the standard axioms of decision theory, the only rational choice is to take the action with the highest expected utility, so it’s absolutely relevant here. The point is that switching and not switching both have the same expected payoff, so there’s no reason to prefer one to the other.

I wrote a quick simulation in R to illustrate what’s going on here. First, the code:



rm(list = ls())

N <- 10000
x <- rep(NA, N)
y <- rep(NA, N)
for (i in 1:N)
{
  draw <- rexp(1)
  envelopes <- c(draw, 2 * draw)
  indices <- sample(c(1:2), 2, replace = FALSE)
  x* <- envelopes[indices[1]]
  y* <- envelopes[indices[2]]
}

m <- c(mean(x/y), mean(y/x))


I’m generating a bunch of pairs of envelopes, one of which has twice as much money in it as the other. I then store the values of the one I chose and the one I didn’t choose, and at the end I look at the average ratio of one to the other. As has been calculated, they both come out to about 1.25. Therefore, like I said above, there’s no reason to prefer switching to not switching.

(I’m using the exponential distribution for convenience. Nothing in the simulation depends on the exact distribution of the smaller amount.)

You’re right that expected value is relevant–I didn’t say it wasn’t, rather, I said the particular weighted average that the OP was using isn’t relevant. The weighted average the OP is using isn’t the same as the weighted average you’re using.

The OP is taking the average of 2X and Y/2, where X = M, Y = 2M, and M /= 0. But he thinks it’s possible for X to equal Y. (He didn’t think this out loud, but it’s an assumption required to make his reasoning work.) The fact that X /= Y is part of what makes the average of 2X and Y/2 irrelevant to the question of what anything useful’s “expected value” is.

You on the other hand took the average of M and 2M. That’s better! :smiley:

There are two main models to consider. The finite case and the infinite case.

[ul]
[li] The finite case might as well just be the case with two equiprobable possibilities for (your envelope, other envelope): (1, 2) and (2, 1). (Let’s use X as shorthand for “your envelope” and Y as shorthand for “the other envelope”). In this case, E[Y/X] = 1.25 and E[X/Y] = 1.25 as well, since these are both just the average of 1/2 and 2/1. But this doesn’t imply E[Y] and E to both be larger than each other. Indeed, it doesn’t even imply that E[Y | X = 1] is 1.25; in this case, upon seeing your envelope, you can gain information about whether the other envelope is larger or smaller (because you may discover your own envelope to have the smallest or largest possible value in it already). The information gain means, after conditioning on the value of your envelope, it’s no longer guaranteed to be a 50-50 split between “The other envelope is twice as large” and “The other envelope is half as large”; thus, glowacks’ point 2) kicks in, and, in fact, E[Y] = E = 1.5.[/li][li] The infinite case is illustrated with the large squares above. Again, E[Y/X] = 1.25 and E[X/Y] = 1.25 as well, but this is no more paradoxical than it was in the finite case. However, in the infinite case it is possible, in some sense, to maintain that seeing one envelope gives you no information about whether the other envelope is larger or smaller. The lack of information gain means, even after conditioning on the value of your envelope, there is still a 50-50 split between “The other envelope is twice as large” and “The other envelope is half as large”; thus, for any particular value k, we have that E[Y | X = k] = 1.25k. Taking the average over all values of k, this tells us that E[Y] = 1.25 * E, which is to say, E[Y] = E + 0.25 * E. This seems paradoxical, as we can also symmetrically find that E = E[Y] + 0.25 * E[Y]. This seems paradoxical, as it implies that each of E and E[Y] is equal the other one plus something positive. However, this is perfectly fine, because E and E[Y] are both infinite. If two things are infinite, then there’s no problem in having each of them amount to the other one plus something positive. Thus, glowacks’ point 1) kicks in.[/li][/ul]

*: If the mere fact that I am using symbols like “E” rather than writing out “the expected value of your envelope” makes you fidgety, then there’s no helping you. They mean the same thing and are much quicker to write and read. No use getting fidgety about the mere trappings of simple math, particularly when one is asking a math question.

Let me put it this way: to compare things by comparing their expected values means comparing them by comparing their weighted averages. Which amounts to the same thing as comparing them by comparing their weighted sums. Which, in the case where all the weights are equal, amounts to just comparing their sums.

So when as ask “How does E compare to E[Y]?”, we’re asking how the sum of all the X values in all the possibilities compares to the sum of all the Y values in all the possibilities. (And if we condition on some information, we’re still asking the same question, just narrowing the possibilities to include in our sum).

To see how these sums work in the case with infinitely many possibilities overall, look at the illustrative squares above (which it requires no great mathematical talent to understand; don’t let all my work drawing them go to waste!). People seem to get mad at me whenever I mention the infinite sums, but the question is fundamentally a question about infinite sums! The counterintuitive results are counterintuitive facts about infinite sums; there’s no getting around it. In particular, the reasoning which seems to imply that both E and E[Y] are larger than each other is precisely the fact that I can construct two infinite sums which are each some positive term plus the other. That’s all that’s going on.

I agree that what happens when we stipulate that there are only finitely many possibilities overall is clear and easy to understand: you SHOULD switch if you open your envelope and find that it’s not the maximum possible value, and you SHOULDN’T switch if you open your envelope and find that it is the maximum possible value. Opening your envelope gives you information about whether or not you should switch; thus, you can’t just blindly switch without opening it.

I disagree that this resolves the paradox, because I feel a fundamental implicit (or sometimes explicit) element of the paradox is that opening your envelope should give you no information about whether the other envelope is larger or smaller. To model this, one has to allow for infinitely many possible envelope values. The paradox is only truly engaged with by thinking about such an infinite possibility space.

To resolve it is to note that the very things which seem paradoxical about this situation are the counterintuitive possibilities one can arrange for with infinite sums. In particular, one can make two infinite sums (one for E and one for E[Y]) which are simultaneously identical to each other and each equal to the other plus some positive term; equivalently (by looking at the difference E - E[Y] between these), one can make infinite sums which come out to different values (positive, zero, or negative) depending on how they’re arranged, as illustrated by the squares above.

That is where the paradox fundamentally lies and is resolved; in the domain of infinite sums. Everyone yells at me for making them explicit, but A) the very fact that the case with highest and lowest possible envelope values is so easy to resolve is the indication that stipulating that there are only finitely many possible values does not actually engage with the paradox, and B) the kind of reasoning about infinite sums which does explain the unencumbered paradox isn’t something it takes great, or even any, mathematical talent to appreciate it. There’s nothing to run away from in looking at those squares; this is kids’ stuff. You don’t need to know any mathematics coming in to appreciate it; it’s just “Look, see what happens when we set up these additions? That’s all that’s going on.” Literally, it could be shown to a child. I don’t see why everyone gets so upset when I mention the infinite sums.

Indistinguishable and other professional mathematicians here are treating OP as a springboard to devise infinite pdf’s that make the envelope paradox as paradoxical as possible. I’m sure it’s a good exercise for University-level math students, but OP has explicitly asked for minimization of math. As Indistinguishable admits, finite N would be adequate for OP’s needs; indeed N=2 is about as good as any! The most paradoxical examples all disappear when N is finite.

In the sequel, I assume N is finite. Since there are finite number of gold atoms in the observable universe this assumption seems adequate for OP’s purpose.

I’ve underlined part of this quote. I guess we could ask Little Nemo what he intended. I’ve likely missed Nemo’s intent, but I see no sign in OP that he made the underlined assumption. (Which never applies in finite case anyway.) Is this the assumption you’re making, glowack?

Other messages discuss
E[X/Y] = E[Y/X] = 1.25.
Taking the expected quotient at all, seems like a red herring, unless you think this peculiar-looking equality is the paradox which concerns OP. That
1/E .noteq. E[1/X]
is hardly a surprise, and relevant here only in the sense that faulty “proofs” can be built from it.

This is for some particular infinite pdf? That pdf’s might satisfy glowack-2 was never in dispute. I took his comment to mean that glowack-2 must be satisfied (when glowack-1 and -3 are false).

No, that was in reference to a very finite pdf; the one with just two equiprobable possibilities: X = 1 and Y = 2, or X = 2 and Y = 1. The expected value of X is the average of 1 and 2, which is 1.5, and exactly the same for the expected value of Y. The reference to glowacks’ point 2 is in the sense that P(Y > X | X = some particular value) is not always 50%; in fact, in this setup, it’s never 50%.

(I shouldn’t have mentioned the trivial EX = EY = 1.5.)

Here is glowack-2:
“2) The probability of the other envelope having the larger amount is not actually 50%.”

The a priori unconditioned probability is always 1/2. The conditioned probability, when some particular X is known or assumed, is in general not 1/2. This is obvious; AFAIK, we’ve all been in agreement on this all along (possibly excepting OP).

But it sounds like you’re reading glowack’s “probability” as conditionable, i.e, “probability once the envelope’s contents are disclosed.”

That’s where I’m getting lost in recent discussion, and perhaps it’s a quibble of some sort. Glowack-2 as written above doesn’t imply E[X>Y | X=k] for all k; the common-sense of the problem doesn’t imply it; and the wording of OP doesn’t imply it. As far as I can see, you seek to enforce the constraint with some artificial infinite pdf just because it makes the paradox more elegant, the math more interesting. Correct?

glowacks is correct that one of glowacks-1 and glowacks-2 must happen, in the following sense:

For any symmetric distribution of positive pairs (x, y) whose coordinates are always separated by a factor of 2, at least one of the following must be true:

  1. The expected value of the X-coordinate is infinite, or
  2. There is some possible X-coordinate k such that P(Y > X | X = k) and P(Y < X | X = k) are not equal.

Proof: Suppose 2) is false. Then P(Y > X | X = k) = P(Y < X | X = k) = 50%, for all possible X-coordinates k. This means E[Y | X = k] = 50% * 2k + 50% * k/2 = 1.25 * k, for all possible X-coordinates k; taking the expectation of both sides yields E[Y] = 1.25 * E.

However, at the same time, by the symmetry of the distribution, we must have E = 1.25 * E[Y]. The only positive solution to these simultaneous equations is where both E and E[Y] are infinite, making 1) true.

By “the constraint”, I assume you mean “That P(Y = 2X | X = k) = P(Y = X/2 | X = k) = .5, for all k”?

The reason I seek to assume this is because it’s thoroughly implicit in the OP’s calculation:

The use of the weights .5 in that calculation tells us that this constraint is being assumed. Obviously, with different weights, that calculation could be made to go differently, but the OP tells us they want to use weights of .5, which is to say, they’re telling us to use the assumption that P(Y = 2X | X = k) = P(Y = X/2 | X = k) = .5

So:

I’m disagreeing. I want the conditioned probability to be 1/2 as well; that’s the scenario the OP is discussing, and that’s the scenario I’m interested in discussing.

I agree that in the scenario where there’s a maximum possible envelope value, the conditioned probability will not generally be 1/2, because of course, in the case where one happens to find the maximum value in their own envelope, the other envelope is guaranteed to be lower and cannot possibly be higher, making the conditioned probabilities 1 and 0 rather than 1/2 and 1/2. Accordingly, though, I’m just not interested in the scenario where there’s a maximum possible envelope value; it doesn’t capture what the OP is talking about.

The reason I gave the values of .5 to both possibilities is because I assumed your first pick was random. You may have picked the envelope containing the larger or smaller amount with equal probability. If you picked the envelope with the smaller amount then the other envelope contains twice as much as your envelope. If you picked the envelope with the larger amount then the other envelope contains half as much as your envelope. This was how I got the value .5 (2X) + .5 (X/2).

Now a lot of people seem to be saying this is an impossible assumption because I can’t have picked both the larger and smaller amount - I picked one or the other. That’s obvious. But does that make it impossible to do any calculations? This seems to undermine the entire field of probability if you can’t work with a range of possibilities.