Interesting little math/probability game/puzzle

You’re playing a game. The way it works is that there are a fixed number of predefined values “available” in the game. For instance, “0,100,200”. You always “own” exactly one of these values, and are randomly assigned one to start the game. Your objective is to end with as high a total as possible.

At any point you can “stand”. So obviously in this example if you have 200, you stand.

Your other option is to pay a fixed predefined cost in “points” (which is the same thing the value is adding up). If you do, you will switch to one of the other available values, at random.

So, let’s say that the fixed cost is 5. If you start with 100, you are paying 5 to randomly switch to either 0 or 200 (but effectively, -5 or 195, as you’ve paid 5). If you end up at 195, obviously you can’t do any better at this point. If you end up at -5, you can pay another 5 to random switch to 100 or 200, but now you’ve paid 5 twice, so 90 or 190.

The question is: when should you switch and when should you stand, given the possible values, your current value, and the cost to switch.

Now, the level 1 answer is that your EV for switching is (average of other values) - cost. If that is less than what you have, you should stand, otherwise you should switch. BUT, I’m pretty sure that is wrong, because of the guaranteed switching. By that logic, in this example, you would stand at 100, because the average of the other values is 100, and you have to pay 5 to switch, so your EV of switching is 95. BUT, crucially, if you end up at 0, then you get a REALLY good deal where you can pay another 5 to end up with an EV of 140. So I think that in this example, with these numbers, it is always correct to switch if the current value you own is not 200.

But, if instead of just (0,100,200) the numbers available were just (0,1,2,3…198,199,200) then I’m pretty sure this logic would not apply, and the straightforward EV math would be correct.

So… can anyone come up with a rule or formula that describe the optimal behavior? (If there is one, I don’t know it.)

If those are the numbers with a huge gap between the cost of switching and the difference in the rewards, I can’t imagine the optimal strategy isn’t to keep switching you’ve got the max prize. As the cost increases this will change. It could be programed up as a two variable perpetual dynamic programing problem.

I think you can do it as a one-variable dynamic programming by treating the so far paid switching costs as a sunk cost.

You can’t do it that way if you have a utility function for net payoff I’m pretty sure.

But it’s dinner time here.

You have to calculate your EV for continuing switching until you’ve reached some sort of end point. So for the simple example:

If you’re at 200 you stop.

If you’re at 100 you keep switching if you’re expected to get to 200 with less than 20 additional switches, which you are.

If you’re at 0 you always switch.

And if you use the numbers from 0 to 200 it becomes horribly complicated.

Could you confirm the exact rules? I’m inferring that -

You are the only player.
You know the set of numbers.
You can pay to switch as many times as you wish.

If that’s the case, then

If the cost to switch were a modest value like 5, it would pay to switch and keep switching until you reach some high-ish number. I don’t see how any kind of straightforward EV calculation tells you how high you should get to before the EV from continuing to switch is negative. But it would not depend on how many prior switches you had executed, that cost is gone and does not influence future strategy. Whether to continue switching would depend solely on whether your current number is high enough.

Yes, those are the correct rules.

But, now that I think about it, it’s still trickier than I thought, because of the fact that you can continue to switch. So you are paying 5 not just for a new random number between 0-200, but also for the-chance-to-pay-5-more-for-yet-another-new-number-between-0-200.