Macro Quantum Effect witnessed: Implies time travel? Multiple Universes?

Just to clarify, I have no fondness for hidden variables, as such; quite the opposite, I am happiest to discuss only the ultimately predicted observations as the fundamental entities. But I don’t believe an epistemological account of wavefunction collapse requires the wavefunction to be taken as merely representing uncertainty about underlying hidden variables with definite values [just as I don’t believe an epistemological account of probability distribution collapse can only be carried out in the specific context of taking the probability distribution to represent uncertainty about underlying hidden variables with definite values]. While I’m at it, I think most alternative accounts of wavefunction collapse are bogged down in a morass of confusion [e.g., it certainly doesn’t matter to the physics whether a photon bounces off and enters your eye or not; that’s a confused reading of the “observation” terminology].

Just to clarify something else, I, er, have no fondness for hidden variables, as such, and, quite the opposite, etc., etc., am not claiming that there actually are underlying hidden variables with definite values, etc., etc. But think much popular discussion about superpositions is glib and doesn’t actually emphasize any non-classical phenomena.

Anyway, all the more to stress the observational content of the manner in which quantum mechanics departs from the classical,

Well, let’s walk through an illustration of Bell’s theorem, and then decide what the consequences are.

We can set up a situation where there are two machines, located at different points in spacetime, outside of each other’s lightcones (basically, you can think of the setup as two machines at different locations in space, but with everything happening at the same time), with the following property: each machine can be fed an angle as input, and will then produce one bit of output (let’s call this “red” or “blue”). The interesting thing, observed empirically over many runs of this experiment, is that, if both machines are given the same angle as input, they always produce the same output. More generally, again as observed empirically over many runs of this experiment, out of all the situations where the difference between the two machine’s input angles is theta, the proportion of times where both machines output “red” is about cos^2(theta)/2, the proportion where both output “blue” is also about cos^2(theta)/2, the proportion where Machine 1 outputs “red” and Machine 2 outputs “blue” is about sin^2(theta)/2, and the proportion where Machine 1 outputs “blue” and Machine 2 outputs “red” is about sin^2(theta)/2. That is, about cos^2(theta) of the time, the machines produce the same output.

In particular, let us simplify things even further, and only concern ourselves with three angles, each separated by 120 degrees. Then the setup is that each machine has three buttons (let’s call them A, B, and C) out of which one can be pressed, after which a single bit of output is produced; if the same button is pressed on both machines, then they always produce the same output. However, if different buttons are pressed, then only about a quarter of the time do they produce the same output.

So what? Well, now let’s think about modelling this with the particular mathematical construct of a probability distribution (on the space of all 30 possible complete input-and output-configurations; i.e., an assignment of a number >= 0 to each of these, with the total summing to 1). Which distribution? Well, let’s discuss various properties such a distribution could have. It could be the case that each of the nine particular input-configurations is equiprobable (i.e., the machines’ inputs are chosen independently at uniform random); we’ll say that a distribution with this property “doesn’t predict the inputs”. It could be the case that the events at Machine 1 are independent of those at Machine 2, in the sense that the probability of each complete input-and-output configuration is simply the product of the probabilities of the corresponding Machine 1 and Machine 2 configurations; we’ll say that a distribution with this property “fixes the common causes”. Finally, it could be the case that the distribution “satisfies the empirical condition”, in the sense that the probability of any particular complete input-and-output configuration with mismatched inputs is a quarter of the probability of the corresponding input-configuration.

Again, so what? Well, the interesting thing now is that we can now prove an interesting fact. We’ll say that a probability distribution is “local” if it is the weighted average of distributions which fix the common causes and don’t predict the inputs (in mathematical jargon, it arises as a “marginal” on such distributions). The interesting fact is that no local distribution satisfies the empirical condition.

Proof: Let P be an arbitrary distribution which fixes the common causes and doesn’t predict the inputs. First, we will show that, under P, each input (A, B, or C) must make either an output of blue almost certain or an output of red almost certain (in the sense of the conditional probability being 1). This is because 0 = P(Machine 1 outputs red and Machine 2 outputs blue and Machine 1 has input A and Machine 2 has input A) = P(Machine 1 outputs red and has input A) * P(Machine 2 outputs blue and has input A) [this last step following from the independence defining fixing the common cause]. Thus, either P(Machine 1 outputs red and has input A) or P(Machine 2 outputs blue and has input A) is equal to 0. Symmetrically, either P(Machine 1 outputs blue and has input A) or P(Machine 2 outputs red and has input A) is equal to 0. But P(Machine 1 outputs red and has input A) + P(Machine 1 outputs blue and has input A) = 1/3, and similarly for Machine 2; it follows that either P(Machine 1 outputs red and has input A) = P(Machine 2 outputs red and has input A) = 0, or the same with blue substituted in for red. And symmetrically for inputs B and C. Accordingly, each input has a corresponding color which it makes almost certain.

Next, we will show that this means P(the two machines give the same output) is at least 5/9. For each of A, B, and C there is one color which it makes almost certain as output; this color has to be the same for at least two of the inputs. Accordingly, P(the two machines give the same output) = P(the two machines give the same output and have the same input) + P(the two machines give the same output and have different inputs) = P(the two machines have the same input) + P(the two machines have differing inputs with matching assigned colors) = 1/3 + P(the two machines have differing inputs with matching assigned colors) >= 1/3 + 2/3 * (2 - 1)/3 = 5/9.

Finally, as the above held for an arbitrary distribution fixing the common causes and not predicting the inputs, we can conclude that, for any local distribution P, the probability that the two machines give the same output is a weighted average of values which are at least as large as 5/9, and thus is at least as large as 5/9 itself. And also, for any local distribution, the probability that both machines have the same input = 1/3, just by taking the weighted average of values which are all equal to 1/3. But then this cannot satisfy the empirical condition, for the empirical condition tells us that the probability that the two machines give the same output = the probability that both machines have the same input + 1/4 * (1 - the probability that both machines give the same input), which would equal 1/3 + 1/4 * 2/3 = 1/2, which is less than 5/9. This completes the theorem.

Discussion of its implications can now follow.

Like any long post, there are various typos in the above which I am now too late to fix. Persevere through, valiant reader…

Oops, one important typo. I forgot some halvings, as reinstated in bold. Luckily, it doesn’t change anything in the rest of the example (cos^2(120 degrees/2) is still 1/4).

I should also perhaps clarify that by “angle”, all I really mean is “direction in (in fact, three-dimensional) space”. (And if any physicists are reading, yes, I know that the most direct setup has agreement and disagreement switched around from the way I did it, but my way seemed a little easier to write up concisely, so I went ahead and had one detector “flipped”, so to speak.)

Since the empirical condition is satisfied, the probability distribution either predicts the inputs or doesn’t fix the common causes.

Having only a basic understanding of probability, I would have thought that all distributions on the inputs would fix the common causes–just relying here on the grade school notion that the probability of A AND B is the probability of A multiplied by the probability of B. This would imply that the probability distribution discussed in the theorem must predict the inputs. But I’ll stop there and ask what I’ve misunderstood already. (For example, am I right that the grade school notion of the probability of A AND B applies here and implies that all probability distributions on the inputs must fix the common causes?)

Remember, a probability distribution (at least, so far as our finitary purposes go) is just some way of assigning numbers >= 0 to each possibility in some space, with them all adding up to 1. The probability of any subset of all the possibilities is then just the sum of the encapsulated numbers. So nothing’s forcing P(A AND B) to equal P(A) * P(B) for arbitrary subsets A and B of possibilities.

In fact, the equality between the two is characteristic of A and B being probabilistically independent. Consider a coin flip (with heads and tails equiprobable). What’s the probability that it comes up heads? What’s the probability that it comes up heads AND it comes up heads? Is the latter the square of the former? Less trivially, consider the probability that a digit from 0 - 9 (each equiprobable) is even, the probability that it is prime, and the probability that it is even AND a prime. Is the third the product of the first two?

Seriously, is there no way of simplifying that?

What’s the least I’d need to learn to understand anything you said after the bit about Bell’s Theorem and deciding what the consequences are?

Well, you can start reading only from “In particular, let us simplify things further…”, and then skip the proof at the end, if you like. That’s just three paragraphs, with only grade school math. If that’s not suitably simple, then let me know what the stumbling block is.

:smack:Yes yes, I remember now, this is also from grade school.

Sorry!

I’ll give this some more thought before responding again.

Well, it doesn’t make sense to say “the empirical condition is satisfied” in itself. But, of course, it’s clear what you were saying:

A particular probability distribution on our sample space of 30 possibilities may or may not satisfy the empirical condition. I gave a very weak empirical condition which doesn’t uniquely specify a distribution since it was all that was needed for my theorem, but there is one particular distribution which concerns us more than others: the one in which the probability of “Machine 1 has input X, Machine 2 has input Y, Machine 1 has output Z, and Machine 2 has output W” is 1/18 when X = Y and Z = W, 1/72 when X is not equal to Y but Z = W, and 1/24 when X is not equal to Y and Z is not equal to W. We may as well call this “the empirical distribution”. This is the distribution which matches all the experimentally observed frequency proportions when we try to choose the machine inputs at uniform random. It’s the unique distribution which doesn’t predict the inputs, satisfies what I called “the empirical condition”, and is also symmetric under interchange of the two possible output values.

You can read right off that the empirical distribution does not “fix the common causes”. Remember, this just means that it does not make the goings-on at Machine 1 probabilistically independent from those at Machine 2. That’s clear enough; originally, the probability that Machine 2 has input A and output blue, for example, is 1/6. However, conditioned on the event that Machine 1 has input A and output red, the probability that Machine 2 has input A and output blue switches to 0. Learning information about the goings-on at Machine 1 can tell you something about the goings-on at Machine 2.

In itself, this is not terribly surprising; why shouldn’t there be correlation? But we might expect that any correlation arises from some property of a common cause (“If the winning numbers printed in New York are correlated with the winning numbers printed in LA, it’s no surprise; they’re both taken from readings originally made in Chicago, and then sent out to the coasts (with some small chance of printing errors at the end)”), and that, therefore, a probability distribution which has already incorporated all the information there is to know about all the properties of all the common causes would make the goings-on independent (“Once I know the readings in Chicago, I can’t gain any more information about the numbers in New York by asking about the numbers in LA; learning about printing errors at one end tells me nothing about printing errors at the other end”). Hence, the name I gave to that condition.

Furthermore, we would then expect the empirical distribution itself is just a probabilistically weighted average of the various distributions conditioned on further information about the common causes. (If Y_1, Y_2, …, are such that one and only one of them will occur, then the probability of any arbitrary X can always be taken as the weighted average of the probabilities of X conditioned on each Y_i, each weighted by the probability of that Y_i).

Now, remember, the interesting theorem isn’t that the empirical distribution happens not to fix the common causes. The interesting theorem is that the empirical distribution can’t even arise as a weighted average of distributions which fix the common causes and don’t predict the input.

One reading of this: it’s not possible to have extra information which both “accounts for” the correlation between Machine 1 and Machine 2 and is not itself correlated with the inputs. (Of course, the whole point of writing out the example and the (very simple) math is so you can decide for yourself what kind of reading you would like to give it.)

I don’t understand why people have takn this to mean local hidden variable theories can’t work. By the theorem together with the fact that the empirical condition is satisfied, it seems like the most natural thing to say is that either the two inputs aren’t independent of each other, or the 9 combinations aren’t equiprobable. The most natural way to understand that, in turn, would seem to me to be to hypothesize there is some kind of “common cause” to the two events–a hidden variable, in other words. And why not a local one?

So–if you’re explaining Bell’s theorem (or anyway, if I am understanding you correctly, a special but representative particular case of it) and if I’ve understood it right, I do not understand why mathematicians and phycisists have taken the empirical confirmation of it to mean local hidden variable theories are false. What you’ve explained seems to me rather to suggest the truth of some local hidden variable theory or other.

(The above was written before your last post)

Wait, why not? Doesn’t this just mean that observations confirm the probability is 1/4?

Well, when you run the experiments, you’re free to choose the inputs to the machine however you want. You can roll dies at each end, if you like. If you go down the road of denying that the inputs are independent, you’re saying that die rolls at the two ends are correlated; if this arises locally, it must mean that the separate die rolls’ values are actually influenced by some event way back which communicated with both dice. Seems implausible, doesn’t it?

ETA: Er, this was in response to your penultimate post, not your ultimate post.

All I’m pointing out is that I defined “the empirical condition” as a predicate upon probability distributions. Some satisfy it, some don’t. There is a particular probability distribution which satisfies it which we are concerned with, it is true (what I called “the empirical distribution” in that post), but there are also others which don’t. I’m just highlighting this conceptual organization; we can talk about any probability distribution we like, even ones which aren’t the empirical distribution. Indeed, we have to to even begin to discuss the theorem.

Anyway, if you like, we can forget “the empirical condition”, and only talk about “the empirical distribution”; the theorem is then that “the empirical distribution is not a weighted average of distributions which fix the common causes and don’t predict the inputs”. The empirical condition was just one particular property which the empirical distribution satisfied, which happened to be sufficient for the theorem, but, you know, the further generality doesn’t really concern us, so, whatever.

(I’ll explain why it is that I used the word “local” for being a weighted average of distributions which fix the common causes and don’t predict the inputs in a bit, but the idea is basically all in those parentheticals about why we wouldn’t be surprised by correlations in NY and LA printings of some Chicago lottery.)

I wouldn’t think it implausible, if in fact the outcomes are conforming to the empirical distribution.

Now, if I actually did this with two machines by rolling two dice which never had anything to do with each other, then I wouldn’t come up with the empirical distribution, and there’d be no reason to think the two dice have some single influence causing both their results.

But if I do get the empirical distrbution, then don’t I have a reason to think the two dice have somehow been coordinated by some common cause?

I think you may be confusing the inputs to the machines and the outputs to the machines.

The inputs to the machines (one of three possibilities at each) are observed to come up in each of the nine possibilities with equal frequencies. No correlation is observed there. These are the dice, so to speak.

The correlation observed is that the outputs of the machines (one of two possibilities at each) are never different if their inputs are the same, while the outputs of the machines are the same only about a quarter of the time when the inputs are different.

(Our whole setup has 30 complete input-output possibilities: 6 where both machines have the same input and output, and 24 where both machines have different inputs and some (maybe different, maybe the same) outputs.)

You’re right…

ETA I’ll come back to the thread later, re-reading paying special care not to get confused about that.

I should also say, the usual presentation of the theorem is a priori much weaker than the way I’ve put it, the usual presentation only demonstrating that the empirical distribution cannot arise as a weighted average of distributions which don’t predict the inputs but completely specify everything else in the sense that “Each of the three inputs is assigned a corresponding output which is guaranteed for any machine by that input”. However, as demonstrated by the beginning of my proof, we can actually generalize, since it turns out fixing the common causes, even though a priori a much weaker property, will entail the quoted property, which I think is well worth observing.

It really belongs in GD, but here’s a quick finish to my earlier side-track.
I think I’ll revise my point a little bit re: QM is ‘weird.’ I still think it is to the lay audience, but only because of its relative newness. ~90yrs is certainly old, but compare that to the 400yrs we’ve had to get used to Newton. Also, while it does wonders for fleshing out our understanding of the universe, QM hasn’t contributed any commonly used technology that would lead to more widespread familiarity. In 50-100yrs, people may very well take QM for granted while the newest big thing (brane theory?) will seem bizarre even though it accurately describes things.

I think perhaps better evocative terminology for “fixes the common causes” would be “is entanglement-free”. Or, rather, I think it’s useful to see the connection between both perspectives on what it means (for a probability distribution to decompose into an independent product). The point being that entanglement merely amounts to correlation. And that, in the relevant situation, the interesting thing isn’t the correlation itself, but the fact that any variable which accounts for the correlation (in the sense that the correlation disappears after conditioning on this variable) must potentially carry information about the outcome of the dice rolls.